pdf2json PDF downloaded through request unreadable. From file it is readable.

So, I am intend to use pdf2json to test my pdf generator service within cucumberjs. When I read the expected pdf from file I can parse the PDF. No problem. When I obtain the generated pdf from the service, it is not possible to parse the PDF. After some investigation I found the problem. The Buffer returned for the file has the same amount of bytes allocated as the number of bytes in the PDF. The Buffer created by the request lib to download the PDF from the service is larger then the number of bytes put into it. This seems to be a problem for pdf2json or the underlaying pdf parser:

    { parserError: 'An error occurred while parsing the PDF: bad XRef entry' }).

For file pdf:

    pdfBuffer.buffer:  ArrayBuffer { byteLength: 1004 }
    pdfBuffer.length:  1004

For downloaded pdf:

    pdfBuffer.buffer:  ArrayBuffer { byteLength: 8192 }
    pdfBuffer.length:  1004

I work around this problem by creating a new buffer of the correct length and copying the data into it. Then it works.

    let bufferNew = Buffer.alloc(pdfBuffer.length);
    pdfBuffer.copy(bufferNew);

It seems to me that the buffer is parsed too far...

Jul 16 '18 09:07 radboudp

I ran into the exact same issue. @radboudp thanks for the workaround.

Feb 13 '19 13:02 nettad

Got exactly the same bug, work from a physical file, does't from a stream @radboudp thanks for the workaround.

Apr 11 '19 15:04 jbdemonte

+1 thanks @radboudp

Aug 27 '21 18:08 jonaskello