compressed-vec icon indicating copy to clipboard operation
compressed-vec copied to clipboard

XorAppender is not providing the expected compression

Open msk-apk opened this issue 1 year ago • 2 comments

I used the below code to compression vector with f32. But its not providing even 50% compression.

Following are the data for a vector with 256 as dimension. original size in bytes of f32 vector 1048 compressed vector size in bytes 1011 uncompressed size of f32 vector in bytes 1048 for vector which has dimension 1000, original size in bytes of f32 vector 4120 compressed vector size in bytes 3854 uncompressed size of f32 vector in bytes 4120

Is there anyway I can get better compression ratio?

fn main() { let mut appender = VectorF32XorAppender::try_new(2048).unwrap(); let dimension = 1000; let mut data: Vec = vec![]; let mut range = rand::thread_rng(); for i in 1..dimension { let value:f32 = range.gen(); data.push(value); } println!("original size in bytes of f32 vector {}", data.get_size()); let finished_vec = appender.encode_all(data).unwrap(); println!("compressed vector size in bytes {}", finished_vec.get_size()); let reader = VectorReader::::try_new(&finished_vec[..]).unwrap(); let mut sink = VecSink::::new(); reader.decode_to_sink(&mut sink).unwrap(); let uncompressed_data: Vec = sink.vec; println!("uncompressed size of f32 vector in bytes {}", uncompressed_data.get_size()); }

msk-apk avatar Jul 25 '24 11:07 msk-apk

The XOR compressor is based on similarities of successive values, so it works best when data isn't changing very much. It looks from your example that you are generating random values, which is pretty much worst possible case for this kind of compression. Instead, this is designed for more real life floating point time series, which may not be changing very fast most of the time.

velvia avatar Jul 26 '24 08:07 velvia

got it thanks.

msk-apk avatar Jul 26 '24 12:07 msk-apk