pdf2json icon indicating copy to clipboard operation
pdf2json copied to clipboard

Help: Take 150 seconds to parse

Open nickbullll opened this issue 7 years ago • 0 comments

I'm trying to process this document - http://imwerden.de/pdf/bocharov_roman_tolstogo_vojna_i_mir_1978__ocr.pdf and it takes +-150 seconds.

Why so long ? Speed depends on font type ?

Also similar size pdf (http://gsl.mit.edu/media/programs/south-africa-summer-2015/materials/0to1.pdf) processed for 5 seconds.

const parsePdf = (filePath) => {
  let pdfParserRawText = new PDFParser(this,1);
  pdfParserRawText.loadPDF(filePath);
  
  return new Promise((resolve, reject) => {
    pdfParserRawText.on('pdfParser_dataError', errData => {
      reject(new Error(errData.parserError));
    });
    pdfParserRawText.on('pdfParser_dataReady', pdfData => {
      const rawText = pdfParserRawText.getRawTextContent();
      resolve(rawText);
    });
  });
};

nickbullll avatar Jul 25 '18 20:07 nickbullll