fast-xml-parser icon indicating copy to clipboard operation
fast-xml-parser copied to clipboard

Parsing directly ArrayBuffer

Open lpatiny opened this issue 4 years ago • 6 comments

In the project https://github.com/cheminfo/mzData we are using fast-xml-parser to parse scientific data (mass spectra).

Those data may be quite big and it works perfectly even with files of 400Mb.

However we may have files of 1Gb or more and there is currently a text size limitation (from javascript in Chrome) that is 512Mb.

I wonder if it would be possible to accept directly an ArrayBuffer and not only a text file. The current code uses nearly exclusively the array of chars so that most of the code could be compatible with ArrayBuffer but it would need to convert deal with multiple byte characters.

lpatiny avatar May 25 '21 09:05 lpatiny

I'm glad you find this repository helpful. I'll try to address your issue ASAP. You can watch the repo for new changes or star it.

github-actions[bot] avatar May 25 '21 09:05 github-actions[bot]

Such big data might not be good for a web application. So if it is being used on backend then I'm not sure if this library is really a good choice for your project. I believe some library which works on stream would be a better choice. ArrayBuffer might not help completely.

amitguptagwl avatar Jun 06 '21 08:06 amitguptagwl

Big data works pretty well in the browser for us. We process TIFF images of 1.5 Gb (electronic microscopy) in javascript in the browser without problems.

Indeed some libraries are working on stream but this one is faster this is why I was interested in this improvement.

lpatiny avatar Jun 25 '21 09:06 lpatiny

Okay. To make it working perfectly for big data, we'll have to process streams. It is achievable but it complex the code and impact overall performance. I'm tagging it as a feature request.

amitguptagwl avatar Jun 26 '21 09:06 amitguptagwl

We adapted the code to be suitable for our needs and parse directly a large ArrayBuffer or Uint8Array.

We had to change many things so that we could also parse a base64 encoded value (as a typedArray) to a Float64Array (and we still have a little bit of work on this).

Anyway the new parser is working and on my MacMini M1 I can parse a file of 1 Gb in 4.5s which is reasonable.

https://www.npmjs.com/package/arraybuffer-xml-parser

@amitguptagwl For me you may close this issue

lpatiny avatar Sep 01 '21 14:09 lpatiny

It's nice to hear. I'm still keeping this issue open so it can be incorporated in future release.

amitguptagwl avatar Sep 06 '21 05:09 amitguptagwl