disk.frame icon indicating copy to clipboard operation
disk.frame copied to clipboard

Fixed width files

Open ghost opened this issue 7 years ago • 4 comments

This is an enhancement request, but I can't see how to designate it as such.

disk.frame looks to be wonderfully valuable. Many thanks in advance.

It would be helpful if the csv reading capacity could be extended to fixed-width files, as these files (often in the form of logs, etc) are typically massive.

The readr::read_fwf() is a nice implementation of fwf input, and might be a model for work on something comparable for this package.

Many thanks

ghost avatar Feb 01 '19 20:02 ghost

Sounds useful. The problem with all of these is that the functions don't naturally allow for chunk-by-chunk reading. I have made a feature request to the chunked package which is the only package I know that does chunk by chunk reading.

xiaodaigh avatar Feb 01 '19 23:02 xiaodaigh

@aetiologicCanada can you share a self contained example of a fwf file and how to use readr?

I tried

data(cars)
library(gdata)
write.fwf(cars, "test.fwf")
f = file("test.fwf")
readr::read_fwf("test.fwf", n_max=1)

it doesn't seem to work

xiaodaigh avatar Feb 01 '19 23:02 xiaodaigh

data(cars)
library(gdata) 
library(tidyverse)
library(fs)
f = here::here("test.fwf") 

gdata::write.fwf(cars, f) 
junk <- readr::read_fwf(f, skip = 1, readr::fwf_positions(
  start = c(1,4),
  end   = c(2,6),
  col_names = c("A", "B")
))

aetiologicCanada avatar Feb 05 '19 18:02 aetiologicCanada

Maybe log an issue with readr so they can provide a read_fwf_chunked function like the readr::read_csv_chunked. Once they have that, we can use disk.frame::add_chunk to easily create a disk.frame

xiaodaigh avatar Aug 05 '19 06:08 xiaodaigh