snzip icon indicating copy to clipboard operation
snzip copied to clipboard

Got "Invalid data: snappy::Uncompress failed" when decompressing raw file

Open zxybazh opened this issue 7 years ago • 8 comments

I compressed a raw file with snzip -t raw file and when I run snzip -t raw -d file.raw I got the error message of uncompress failed.

zxybazh avatar Oct 14 '18 18:10 zxybazh

Could post more information?

  • OS version and CPU architecture.
  • Test data

It works for me.

$ ./snzip -t raw INSTALL
$ ./snzip -t raw -d INSTALL.raw 

My environment is: OS: Linux (Ubuntu 16.04 x86_64) Test data: INSTALL

kubo avatar Oct 15 '18 12:10 kubo

Hi, I did the test on Ubuntu 16.04, CPU Intel(R) Core(TM) i7-7700. Test data is right here, part of a TPCH dataset. Please check, thanks!

zxybazh avatar Oct 16 '18 03:10 zxybazh

Thanks. The compressed file is incorrectly compressed because of too big data. The maximum size of raw uncompressed data is 4G according to this information.

There are two choices.

  1. Make snzip -t raw fail when the file size is over 4G.
  2. Split file data by 4G and create a compressed file containing concatenated compressed split data.

kubo avatar Oct 16 '18 13:10 kubo

Got it, thanks.

zxybazh avatar Oct 16 '18 17:10 zxybazh

  1. Make snzip -t raw fail when the file size is over 4G.
  2. Split file data by 4G and create a compressed file containing concatenated compressed split data.

The latter is impossible. I can create a file containing concatenated raw compressed data. However I cannot decompress it because snappy checks whether all input data are consumed or not by decompressor->eof(). When two raw compressed data are concatenated, there is no way to know the boundary.

kubo avatar Oct 24 '18 13:10 kubo

I believe we have to make a new file format to store the file length information for splits of raw compressed data over 4G in case we can split them again when decompressing.

zxybazh avatar Oct 24 '18 20:10 zxybazh

What merit does the new file format have? I won't reinvent the wheel unless it has explicit merit.

kubo avatar Oct 25 '18 13:10 kubo

Well, you're right. Let's not reinvent the wheel. It's just that I want to make sure that we can get the boundary for every split when we want to decompress the file. If there is something already there, it would be even better. For now, you may just make it fail when file size is over 4G.

zxybazh avatar Oct 25 '18 20:10 zxybazh