cuSZ icon indicating copy to clipboard operation
cuSZ copied to clipboard

(question) cusz changes input data after compression

Open maltempi opened this issue 3 years ago • 5 comments

Hey everybody,

I've been using cusz APIs and after compressing input data I noticed the input data is not the same anymore -- it looks modified. This can be reproduced with cusz API example.

As far as I understood, according to cusz's wiki the nondestructive=true configuration should avoid this behavior but switching the flags I had no success. I tried checking out the source code, and I didn't find any implementation for that.

Are my assumptions correct? Can I consider that cusz nowadays changes the input data after compression or am I doing something wrong?

Thanks very much!

maltempi avatar Oct 21 '22 00:10 maltempi

Hi @maltempi,

Sorry about your experience. Yes, you are correct; the functionality is not there while the API is.

Can you temporarily duplicate the input data before running the compressor to make your experiments smooth? I will fix this issue soon. The first fix should be to duplicate input data internally, which should come back to you quickly. And later, I will rewrite the memory management (maybe the file format).

In addition, the current memory footprint does not scale well. If you experience such an issue---can be a "segmentation fault" (it won't show "out-of-memory" directly), please also mark it here.

Thank you.

jtian0 avatar Oct 21 '22 16:10 jtian0

Hey @jtian0 , Thank you very much for your prompt response! Ok, I just wanted to confirm if I was in the right direction. I'll be making a copy of the input before of compressing it.

In addition, the current memory footprint does not scale well. If you experience such an issue---can be a "segmentation fault" (it won't show "out-of-memory" directly), please also mark it here.

Thanks for the headsup! I'll keep an eye on it.

maltempi avatar Oct 24 '22 10:10 maltempi

Hi @maltempi,

I made a temporary fix not to destroy the input data, and I put it to an unstable branch. See this and this.

The internal allocates an array for outlier by default (1x the input data size), which result in an extra memory footprint. Therefore, it is only for demonstration. Memory management relies on a more thorough fix rather than a patch.

Conclusion

  • What's fixed: destroying input data
  • Known issue after the fix: increased memory footprint by default
  • What's the next fix: rewriting the memory management

jtian0 avatar Nov 08 '22 21:11 jtian0

Hi @jtian0,

Is there any plan to really fix this issue? I mean rewriting the memory management. We would like to use cusz but this problem is a blocker for us and the temporary solution is not good enough for us.

Thanks!

hyviquel avatar Dec 15 '22 18:12 hyviquel

Hi @hyviquel,

Thank you for still having confidence in cusz. I apologize for the severe delay in the development--my travel and then coursework occupied November and December until now (the final week of the semester).

And yes, it is planned because it is (also) the blocker for any further plan for cusz. I think mid-December to mid-January could be a good window for me to address the piled issues, especially the memory footprint issue. I'd reopen this issue until it is sufficiently resolved.

jtian0 avatar Dec 17 '22 06:12 jtian0