A target factory for CSV lists of targets

Open nsheff opened this issue 4 years ago • 1 comments

One of my use cases for unitar is that I have a bunch of general-purpose files that I re-use across lots of R projects. I also use these files outside of R.

For my R projects, I process the files in various ways and then re-use these derived files across many projects. I want to use targets to track the processing and caching. Then, I want to use unitar to think of this as a central repository with targets I re-use across many projects.

To make this simpler for me, I created a new target factory that builds targets from a list in a CSV file. This CSV contains 1 row per target. For example, each row can correspond to one of my resource files, and it tracks that file and specifies a function for loading it into R. I like this because the CSV file helps me keep track of each of my resource files, and it feels convenient to me that this is the way I specify targets. But in addition, a row in the CSV can also correspond to an R function call that would process data in some other way.

I created a demo repository to show how this works here: unitar resources demo. For now I put this target factory in the unitar package, but I'm now realizing it might be a more general concept. While my original intent was to use it for a "resources repository" that works with unitar's cross-project concept, in fact, it's really just a way to specify targets using a CSV file instead of traditional R functions.

@wlandau, I'm curious to hear your thoughts on this kind of a CSV-to-target factory.

May 06 '21 20:05 nsheff

This is similar to the concept of a drake plan, which is a data frame.

library(drake)
drake_plan(data = get_data(), analysis = analyze(data))
#> # A tibble: 2 x 2
#>   target   command      
#>   <chr>    <expr_lst>   
#> 1 data     get_data()   
#> 2 analysis analyze(data)

Unlike targets::tar_manifest(), which is for display purposes only, drake_plan() produces serious input to drake::make(). I originally thought the data frame representation would be handy for metaprogramming purposes, but few users picked it up that way. Most preferred to through the drake_plan() DSL rather than use dplyr on the generated plan to create pipelines. But for file-based workflows, if you are pretty much working with text, a CSV/data frame format might turn out to be convenient.

May 07 '21 13:05 wlandau