roomba icon indicating copy to clipboard operation
roomba copied to clipboard

ability to rename columns?

Open cboettig opened this issue 7 years ago • 3 comments

This package is very exciting, great work! Our dataspice team was thrilled to see how nicely it already handles common requests on a dataspice.json file, e.g.:

library(roomba)
json <- jsonlite::read_json("https://raw.githubusercontent.com/ropenscilabs/dataspice/master/inst/metadata-tables/dataspice.json")

## Works nicely when all columns come from same level of nesting:
json %>% roomba(c("givenName", "familyName"))
json %>% roomba(c("value", "unitText", "description"))

It would be great if the cols argument could take a named list that could also rename the output columns on the fly:

json %>% roomba(c("value", units = "unitText", "description")),

cboettig avatar May 24 '18 18:05 cboettig

@cboettig nice idea!

The interesting thing about the workhorse function that roomba's wrapped around, dfs_idx is that we can even include a call to filter based on some value, e.g. "givenName" == "Bob". Right now we just check whether there is any value at all with has_good_stuff() inside roomba.

For this v1 we decided not to allow the user to filter in the roomba() call since we weren't sure how the syntax would work. Say you wanted "value" > 42 & "value" =< 100. (What we could do is allow the user to pass a list of all the conditions that they want to be true and then string them together, separated by &s if keep == all and |s if keep == any before passing the conditions into a revamped has_good_stuff.)

Renaming feels like a nice-to-have, but if we were to make the inputs to roomba() allow for more complexity it feels like the a filter might be more useful than a rename.

Would love to hear what you think about allowing for filtering and concatenating the conditionals!

aedobbyn avatar May 25 '18 15:05 aedobbyn

Very cool. Yeah, it makes total sense to keep the syntax simple even at the cost of features -- we already have a bunch of feature-rich queries like jq that most of us find too complex for regular use.

So I agree that it doesn't make sense to start adding too many different optional arguments inside roomba() function. I do wonder though if a dplyr-esque pipeline syntax might be possible, e.g. a roomba() %>% filter() %>% rename() kind of deal. (might need a lazy-eval strategy where the piped commands are stored and combined into a single operation around a dfs_idx call?)

cboettig avatar May 25 '18 17:05 cboettig

Ah that's a really cool idea! We could even store all the operations the user wants to do and then carry them out in something like

roomba_plan <- 
  roomba(cols = c(x, y z)) %>% 
  filter(x > 4) %>% 
  rename(a = x)

df %>% roomba_execute(roomba_plan)

I suppose it doesn't matter perfomance-wise whether the rename happens in roomba_plan or after roomba_execute() but the filter I can see being useful to pass to dfs_idx rather than doing it after.

aedobbyn avatar May 26 '18 17:05 aedobbyn