dataverse-client-r icon indicating copy to clipboard operation
dataverse-client-r copied to clipboard

Support `vars` option in get_file/get_dataframe

Open kuriwaki opened this issue 5 years ago • 2 comments

vars should be an argument that subsets the columns of the dataset to pull. However, it seems to not affect anything and just returns the whole dataset.

library(dataverse)

df_tab_all <-
  get_file_by_name(
    filename = "roster-bulls-1996.tab",
    dataset  = "doi:10.70122/FK2/HXJVJU",
    server   = "demo.dataverse.org"
  )

df_tab_vars <-
  get_file_by_name(
    filename = "roster-bulls-1996.tab",
    dataset  = "doi:10.70122/FK2/HXJVJU",
    server   = "demo.dataverse.org",
    vars = c("number", "player") # only two columns
  )

# first data should be larger (more data)
stopifnot(object.size(df_tab_all) > object.size(df_tab_vars))
#> Error: object.size(df_tab_all) > object.size(df_tab_vars) is not TRUE


# does it work on get_dataframe?
df_tab_vars <-
  get_dataframe_by_name(
    filename = "roster-bulls-1996.tab",
    dataset  = "doi:10.70122/FK2/HXJVJU",
    server   = "demo.dataverse.org",
    vars = c("number", "player") # only two columns
  )
#> Downloading ingested version of data with readr::read_tsv. To download the original version and remove this message, set original = TRUE.
#> Rows: 15 Columns: 9
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (6): player, position, height, dob, country_birth, college
#> dbl (3): number, weight, experience_years
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ncol(df_tab_vars)
#> [1] 9

Created on 2022-01-12 by the reprex package (v2.0.1)

EDITED 2021-01-12 with new version of dataverse, which now avoids errors and fixes the reprex.

kuriwaki avatar Jan 29 '21 18:01 kuriwaki

While this is open, we should just delete the vars argument to avoid confusion

kuriwaki avatar Oct 16 '24 19:10 kuriwaki

I tried to to obtain the first column of https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/FK2/PPIAXE by

get_file_by_name(
    filename = "nlsw88.tab",
    dataset  = "10.70122/FK2/PPIAXE",
    vars = 1, original = FALSE,
    server   = "demo.dataverse.org", return_url = TRUE)

which gives https://demo.dataverse.org/api/access/datafile/1734017?vars=1 So the vars argument does seem to be attached properly, following the format example here.

However, using that URL still gives me the whole dataset, instead of only the first column.

kuriwaki avatar Oct 16 '24 23:10 kuriwaki