jszip icon indicating copy to clipboard operation
jszip copied to clipboard

zipping 2 million small files

Open eitzenbe opened this issue 10 years ago • 7 comments

Hi Stuart,

we are considering using your jszip to create an archive from a 2 million small text files. Would you see this as problematic? Did you ever test this or similar numbers?

would be more than happy if you could shed a light on this :)

br Thomas

eitzenbe avatar Aug 10 '15 18:08 eitzenbe

I see two parts with the current API, adding the files and generating the zip file. The first part likely won't make it (I'm testing it right now and it uses way too much memory at 1.5M files). I didn't test the second part but it won't work without #195. To handle 2 million files (or more), we would need to avoid holding everything in memory and stream the output (#195 is a step in the right direction but not enough for your use case).

dduponchel avatar Aug 16 '15 20:08 dduponchel

Hi David,

thanx for your feedback. We will resort to os-levelzipping/unzipping as we have to deal with up to 2 million files NOW ;)

br Thomas

On 2015-08-16 22:22, David Duponchel wrote:

I see two parts with the current API, adding the files and generating the zip file. The first part likely won't make it (I'm testing it right now and it uses way too much memory at 1.5M files). I didn't test the second part but it won't work without #195 https://github.com/Stuk/jszip/pull/195. To handle 2 million files (or more), we would need to avoid holding everything in memory and stream the output (#195 https://github.com/Stuk/jszip/pull/195 is a step in the right direction but not enough for your use case).

— Reply to this email directly or view it on GitHub https://github.com/Stuk/jszip/issues/227#issuecomment-131617819.

Crimes against Nature are Crimes against Peace and Humanity. Protect our natural home. https://www.endecocide.org

eitzenbe avatar Aug 17 '15 11:08 eitzenbe

#195 solved a big milestone for making this possible in node

For the browser it's another story

I see two parts with the current API, adding the files and generating the zip file. The first part likely won't make it

I have shown you in some filddles over at #343 that the first part is possible when including node streams in the browser but this could be made even easier when web ReadableStream and possible iterator gets implemented as well

Also the functionality of lazy loading will help where it will get called when it's the files turn to be compressed (but something like this hasn't yet been implemented)

function download (filePath) {
  return new Promise(function(resolve){
     fetch('/dl/' + filePath).then(function(res){
       resolve(res.body) // web ReadableStream
    })
  })
  // just wanted to be extra clear about the promise
  // you could just do:
  return fetch('/dl/' + filePath).then(res => res.body)

  // or if you don't need to wait for something do it without promise:
  return new Blob || new ReadableStream({...}) || fs.createReadStream(...)
}

// Same as above but more modern
async function download(filePath){
  let res = await fetch(...)
  return res.body
}

zip.file('OS.iso', download)
zip.file('readme.md', download)

Generating the zip file

You also need to flush the memory when jszip has done some work. For that you can create a stream for which you should pipe the content to a destination such as sandboxed filesystem, indexedDB, or to the hard drive directly with help of StreamSaver.js

zip.generateInternalStream(...)
.on('data', doSomething)

The way I see it there is 3 parts for which 3 is already done

  • [x] lazily adding files (could be made easier with functions) the only way atm is with node streams
  • [x] writing the zip (using generateInternalStream)
  • [x] flushing part of the file that is currently being compressed
    atm it's like read a hole file - compress it - flush it and repeat with the next file
    This works okey for many smaller files but not where one file is very large
    opt.streamFiles solves it

jimmywarting avatar Sep 24 '16 15:09 jimmywarting

The last bullet point is already implemented: options.streamFiles (which defaults to false). Or do you think of something like #356 ?

dduponchel avatar Sep 24 '16 18:09 dduponchel

Hmm, streamFiles, interesting... Didn't know about it... How wildly supported is data descriptors? Wish I knew more in depth how a zip file is constructed so I had some knowledge of what is possible.

Not sure I really follow #356 and what it will mean.

jimmywarting avatar Sep 24 '16 19:09 jimmywarting

How wildly supported is data descriptors?

Popular archive managers should support them correctly but I didn't test it. We have zip files (1, 2, 3) with data descriptors if you want to test it on your side. I've seen issues about supporting data descriptors on the internet (but not that many) so I used a safe default value.

Wish I knew more in depth how a zip file is constructed so I had some knowledge of what is possible.

I really need to write documentation for this part.

Not sure I really follow #356 and what it will mean.

I'll improve the issue.

dduponchel avatar Sep 24 '16 21:09 dduponchel

Judging by the 1 year this issue has been around and all the new features jszip supports i would assume we could close this now...

btw, the zip files worked fine on my machine

jimmywarting avatar Sep 24 '16 21:09 jimmywarting