pdf2json
pdf2json copied to clipboard
Help: Take 150 seconds to parse
I'm trying to process this document - http://imwerden.de/pdf/bocharov_roman_tolstogo_vojna_i_mir_1978__ocr.pdf and it takes +-150 seconds.
Why so long ? Speed depends on font type ?
Also similar size pdf (http://gsl.mit.edu/media/programs/south-africa-summer-2015/materials/0to1.pdf) processed for 5 seconds.
const parsePdf = (filePath) => {
let pdfParserRawText = new PDFParser(this,1);
pdfParserRawText.loadPDF(filePath);
return new Promise((resolve, reject) => {
pdfParserRawText.on('pdfParser_dataError', errData => {
reject(new Error(errData.parserError));
});
pdfParserRawText.on('pdfParser_dataReady', pdfData => {
const rawText = pdfParserRawText.getRawTextContent();
resolve(rawText);
});
});
};