For DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress failed to decompress .gz file
Summary
I build the decompressor for gzip, but it can only decompress the file offer in the repo. Most files I compressed using cpu version gzip is not able to be decompressed.
Version
I currently using oneAPI 2022.1 and the Code Sample v0.0.51 extension for VScode to pull the code.
Environment
Ubuntu 20.04
Steps to reproduce
Use linux built-in gzip to compress a file. The compression ratio should around 4, and the size of output file should over 50MB. Then the executable we compile with this repo to decompress the file created by linux built-in version gzip.
Observed behavior
The decompression will stuck in a while loop and will never terminate. I print out some information to help me debug the issue, and found out the bug should locate in the huffman decoding kernel.
Here is some debug log I print: Using GZIP decompression
Decompressing '../data/gzip/log-test.log.gz' 2 times Input file size : 677.904KB expected output size :2706127 Launching kernels for run 0 literals_per_cycle for producer is : 1 in_count = 677904 Iteration count = 677904 literals per cycle is : 4 literal per cycle from consumer = 4 huffman shifting = 1098876287 huffman shifting = 1098876287 huffman shifting = 1098876287 huffman shifting = 1098876287 huffman shifting = 1098876287
Expected behavior
I expected the decompressor can decompress all the file I compressed with CPU version gzip
I attached 3 files I used to debug. log-test.log.gz txt-test2.txt.gz pdf-test.pdf.gz
for (unsigned short symbol = 0; symbol < max_codes; symbol++) {
// literal
if (symbol < numlitlencodes) {
auto inner_codelen = codelens[symbol];
if (inner_codelen == codelen) {
lit_map[lit_map_counter] = symbol;
lit_map_counter++;
lit_map_next_code++;
}
}
// distance
if (symbol < numdistcodes) {
auto inner_codelen = codelens[numlitlencodes + symbol];
if (inner_codelen == codelen) {
dist_map[dist_map_counter] = symbol;
dist_map_counter++;
dist_map_next_code++;
}
}
}
lit_map_last_code[codelen - 1] = lit_map_next_code;
dist_map_last_code[codelen - 1] = dist_map_next_code;
After carefully checking the source code, I found the bug is caused by overflow. lit_map_last_code , dist_map_last_code, lit_map_next_code and dist_map_next_code are using type of ac_uint<15>, but if the lit_map_next_code or dist_map_next_code reached 0b111 1111 1111 1111 and performed increment operation it will overflow to 0. Furthermore they are assigned to lit_map_last_code and dist_map_last_code which are used for decoding. Since the last_code is smaller than the first_code all the bit codes with code length = 15 are not going to be decoded correctly.