bigmemory icon indicating copy to clipboard operation
bigmemory copied to clipboard

read from gzfile

Open mvaudel opened this issue 8 years ago • 6 comments

Hi,

Thank you for this useful package. I use to read my matrices from text files using read.big.matrix. I was wondering whether it would be possible to support input from gzfiles?

Best regards,

Marc

mvaudel avatar Jun 13 '17 13:06 mvaudel

@mvaudel I can imagine a quick and dirty solution which would involve just uncompressing the file using gunzip and then reading the resulting file in as a big.matrix. I'm not sure otherwise about any R interface reading directly from gzfiles. If such an interface exists, then we could certainly explore it otherwise I think we will likely refer users to simply uncompress the file themselves (assuming other authors feel the same).

cdeterman avatar Jun 13 '17 13:06 cdeterman

Thank you for your answer. It would be really convenient to read directly from the gzipped files because our files are quite huge so it is a substantial gain of time and space if we can read directly from them and deflate on the fly. Are you working on the files themselves or using a connection? For the latter if you can let us provide the connection directly instead of the file name, that should do the trick (https://stat.ethz.ch/R-manual/R-devel/library/base/html/connections.html).

mvaudel avatar Jun 13 '17 15:06 mvaudel

Hey, check this function. You can found a vignette with more information. This may not be super fast, but it is quite flexible. Check all the arguments you need to specify, especially the file.nline that you have to know explicitly, because the function can't compute it on a compressed file.

privefl avatar Jun 13 '17 21:06 privefl

Any updates on this? I am trying to read a large .txt.gz file that contains character/string data. I know fread can read .txt.gz files, but the file is larger than my available RAM. I can't use bigstatsr::big_read because it does not support character type data.

Would it be possible to combine read.big.matrix with fread in some way, to support reading .gz files?

jarbet avatar Mar 28 '23 17:03 jarbet

Maybe this?

privefl avatar Mar 29 '23 05:03 privefl

Maybe this?

Cool, I see they have a workaround for reading .gz files, so this should work. Thanks!

jarbet avatar Mar 29 '23 16:03 jarbet