disk.frame icon indicating copy to clipboard operation
disk.frame copied to clipboard

Identifying type of column

Open lukeReilly opened this issue 5 years ago • 4 comments

Is there a way to identify the type of a column before reading it into memory, as I don't seem to be able to find it?

I guess one could try reading in one chunk of the disk.frame to find out but I was wondering if disk.frame keeps track of this anywhere?

lukeReilly avatar Aug 10 '20 13:08 lukeReilly

There isn't anything built it.

The fastest way is probably

typeof(head(diskf)$colName)

For now disk.frame uses fst files so you can also use

fst_files = get_chunk_ids(diskf, full.names=T, strip_extension=F)

metadata = fst::fst.metadata(fst_files[1])

and then interrogate the metadata.

Given a data.fram how do you usually get the type? what function do you use? I can implement the same for disk.frame

xiaodaigh avatar Aug 10 '20 14:08 xiaodaigh

I guess to be clearer, it's more that I would want an easy way to select columns of a specific type by using functions like is.numeric is.logical etc:

library(data.table) mtcars_dt = data.table(mtcars) columns_to_select = which(sapply(mtcars_dt, is.numeric)) mtcars_dt[, .SD, .SDcols = columns_to_select]

I guess that using head() like the below may be fine as it shouldn't have any real performance penalties.

mtcars_df = as.disk.frame(mtcars) columns_to_select = which(sapply(head(mtcars_df ), is.numeric))

lukeReilly avatar Aug 10 '20 16:08 lukeReilly

data.table allows functions in .SDcols argument.

library(data.table)
library(disk.frame)

as.data.table(mtcars)[, .SD, .SDcols = is.numeric]
as.disk.frame(mtcars)[, .SD, .SDcols = is.numeric]

While this does not address the root issue, the two major syntaxes (dplyr and data.table) support column selection approaches like this.

ColeMiller1 avatar Feb 24 '21 03:02 ColeMiller1

s (dplyr and data.table) support column selection approaches like this.

The issue is time and resources. I am quite time poor atm, but if you are game, a new package disk.frame.dt containing data.table implementation would be cool. I can help review

xiaodaigh avatar Feb 25 '21 11:02 xiaodaigh