jayvee icon indicating copy to clipboard operation
jayvee copied to clipboard

FEATURE - Support for .7z file format in ArchiveInterpreter block

Open robinrj6 opened this issue 1 year ago • 3 comments

User Story

  1. As a user
  2. I want to use a .7z compressed file
  3. So that I can use FilePicker to access the .csv file from the compressed file

User Acceptance Criteria

  • [ ] {acceptance criteria 1}

Examples

In the case of a dataset created as a result of a journal paper, the paper publishers may release all codes and intermediate data schemas together in a compressed file. 7z format being a lossless compression format, which helps to save space without any loss to the data, is widely preferred.

Notes

Definitions of Done

  • [ ] A PR has been opened and accepted
  • [ ] All user acceptance criteria are met
  • [ ] All tests are passing

robinrj6 avatar May 20 '24 11:05 robinrj6

@robinrj6 Do you have an example data source?

georg-schwarz avatar May 21 '24 20:05 georg-schwarz

This is the database I tried to download : https://figshare.com/ndownloader/files/37887318. This is a complete project code with datasets of different data schemes all compressed to .7z format.

robinrj6 avatar May 22 '24 20:05 robinrj6

Hey, thanks for pointing this out, seems like it would be great to support.

If anyone wants to take a stab at this, the extracting is done in the ArchiveInterpreter block and the implementation in the interpreter is here: https://github.com/jvalue/jayvee/blob/b20fcd91d70f5fd04982668f40d10953688c6653/libs/extensions/std/exec/src/archive-interpreter-executor.ts

We are using JSZip (https://stuk.github.io/jszip/) to extract zip archives in the interpreter, sadly it seems to not support 7zip.

In contrast, 7z does but is under GPL license: https://github.com/use-strict/7z-wasm?tab=License-1-ov-file.

rhazn avatar May 27 '24 13:05 rhazn