poretools icon indicating copy to clipboard operation
poretools copied to clipboard

Tarfile based fastq extraction

Open timp0 opened this issue 9 years ago • 2 comments

It seems to me that poretools working on a tarfile currently extracts the files one-by-one to a .poretools TMPDIR. But with R9 data yields, this might overwhelm a hard drive (since then you have an expanded duplicate of the data.

Does it instead make sense to delete the files after processing is complete, or even to stream the extracted fast5 (not sure how possible this is with tarfile module) straight into h5py for manipulation?

I'm trying to do this myself because my R9 data is large (>800Mbp, ~500Gb tgz) (#R9_problems)

timp0 avatar Jun 29 '16 19:06 timp0

I will try to carve out some time to work on this. I see your point.

arq5x avatar Jul 05 '16 15:07 arq5x

It looks like there is a limitation in h5py which stops us using the files without writing them to disk somewhere (https://github.com/h5py/h5py/issues/730)

It might be possible to only ever have one file written to disk, of course.

duncanparkes avatar Nov 03 '16 14:11 duncanparkes