Support `vars` option in get_file/get_dataframe
vars should be an argument that subsets the columns of the dataset to pull. However, it seems to not affect anything and just returns the whole dataset.
library(dataverse)
df_tab_all <-
get_file_by_name(
filename = "roster-bulls-1996.tab",
dataset = "doi:10.70122/FK2/HXJVJU",
server = "demo.dataverse.org"
)
df_tab_vars <-
get_file_by_name(
filename = "roster-bulls-1996.tab",
dataset = "doi:10.70122/FK2/HXJVJU",
server = "demo.dataverse.org",
vars = c("number", "player") # only two columns
)
# first data should be larger (more data)
stopifnot(object.size(df_tab_all) > object.size(df_tab_vars))
#> Error: object.size(df_tab_all) > object.size(df_tab_vars) is not TRUE
# does it work on get_dataframe?
df_tab_vars <-
get_dataframe_by_name(
filename = "roster-bulls-1996.tab",
dataset = "doi:10.70122/FK2/HXJVJU",
server = "demo.dataverse.org",
vars = c("number", "player") # only two columns
)
#> Downloading ingested version of data with readr::read_tsv. To download the original version and remove this message, set original = TRUE.
#> Rows: 15 Columns: 9
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (6): player, position, height, dob, country_birth, college
#> dbl (3): number, weight, experience_years
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ncol(df_tab_vars)
#> [1] 9
Created on 2022-01-12 by the reprex package (v2.0.1)
EDITED 2021-01-12 with new version of dataverse, which now avoids errors and fixes the reprex.
While this is open, we should just delete the vars argument to avoid confusion
I tried to to obtain the first column of https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/FK2/PPIAXE by
get_file_by_name(
filename = "nlsw88.tab",
dataset = "10.70122/FK2/PPIAXE",
vars = 1, original = FALSE,
server = "demo.dataverse.org", return_url = TRUE)
which gives https://demo.dataverse.org/api/access/datafile/1734017?vars=1
So the vars argument does seem to be attached properly, following the format example here.
However, using that URL still gives me the whole dataset, instead of only the first column.