Identifying type of column
Is there a way to identify the type of a column before reading it into memory, as I don't seem to be able to find it?
I guess one could try reading in one chunk of the disk.frame to find out but I was wondering if disk.frame keeps track of this anywhere?
There isn't anything built it.
The fastest way is probably
typeof(head(diskf)$colName)
For now disk.frame uses fst files so you can also use
fst_files = get_chunk_ids(diskf, full.names=T, strip_extension=F)
metadata = fst::fst.metadata(fst_files[1])
and then interrogate the metadata.
Given a data.fram how do you usually get the type? what function do you use? I can implement the same for disk.frame
I guess to be clearer, it's more that I would want an easy way to select columns of a specific type by using functions like is.numeric is.logical etc:
library(data.table) mtcars_dt = data.table(mtcars) columns_to_select = which(sapply(mtcars_dt, is.numeric)) mtcars_dt[, .SD, .SDcols = columns_to_select]
I guess that using head() like the below may be fine as it shouldn't have any real performance penalties.
mtcars_df = as.disk.frame(mtcars) columns_to_select = which(sapply(head(mtcars_df ), is.numeric))
data.table allows functions in .SDcols argument.
library(data.table)
library(disk.frame)
as.data.table(mtcars)[, .SD, .SDcols = is.numeric]
as.disk.frame(mtcars)[, .SD, .SDcols = is.numeric]
While this does not address the root issue, the two major syntaxes (dplyr and data.table) support column selection approaches like this.
s (dplyr and data.table) support column selection approaches like this.
The issue is time and resources. I am quite time poor atm, but if you are game, a new package disk.frame.dt containing data.table implementation would be cool. I can help review