Automated Building and Storing Frozen Data in R Packages Using Travis and Drat.

Abstract

Clean, accessible, frozen data can be hard to come by in the wild and you will hear terms such as ‘as is’ or ‘as was’. Database direct access can be like using a tank to kill a mosquito. We propose using travis and the drat R package to automatically build a version controlled frozen data R package repository. Package builds can be setup to automatically access secure databases, clean data, and transform data on a monthly, daily, and yearly basis to freeze data in a R data package. We walk through this process and give examples of using these frozen R data packages to build automated reports, ad-hoc analyses, and audit previous data processes. With the addition of travis we can build, check and deploy our packages. We can also automate the process of checking data merging conflicts with previous frozen versions such as the movement of business entities in the business hierarchy. We finally conclude with a proposed process workflow to implement and deploy this automate repository.

Date
Feb 15, 2019 2:00 PM
Location
New Orleans, Louisiana
Avatar
Ben Barnard
Data Scientist

My research interests include the intersection of common sense and data science, statistics education, and the art of consulting on data science projects.