Help on read/write to disk with parallel compression
Hello. I am trying to use the parallel feature of Blosc compression to write (and read) large (~100GB) dataset to disk, similar to h5py's (or zarr's) create_dataset. I could not find a simple example on how to write numpy ndarray to disk. Could you please point me to the right direction? Thanks.
Hi @piby2 ,
In latest version, there is a benchmarking file that can help you: https://github.com/Blosc/cat4py/blob/master/bench/compare_getslice.py
To be able to run it, you should go to latest version either cloning cat4py again (recommended option)
git clone --recurse-submodules https://github.com/Blosc/cat4py
or updating master branch to last commit.
Then compile it using:
rm -rf _skbuild cat4py/*.so* # If you have a previous build
python setup.py build_ext --build-type=RelWithDebInfo
To check the installation, run:
PYTHONPATH=. pytest
Finallly, run the benchmark:
PYTHONPATH=. python bench/compare_getslice.py 1 # 1 enables persistency in this benchmark
PS: You shouldn't use the -O0 flag, it disables all compiler optimizations.