pdf2html
pdf2html copied to clipboard
pdf2html is a module which helps to convert PDF file to HTML pages using Apache Tika. This module also helps to generate thumbnail image for PDF file using Apache PDFBox.
Hello~Appreciate your great work on this. I'm using this package, but the postinstall script always fails for me, due to the slow network and firewall limits. And the `https` module...
Hi, After converting the pdf to HTML, am getting the same HTML code against all the files(different) and with almost blank body data. [jpg2pdf.pdf](https://github.com/shebinleo/pdf2html/files/8785654/jpg2pdf.pdf) Result HTML:
I am converting quite a big file and I encounter the following error: Conversion error: RangeError [ERR_CHILD_PROCESS_STDIO_MAXBUFFER]: stdout maxBuffer length exceeded Can anyone look into the problem and help me,...
code is like; ```js const options = { text: true } pdf2html.pages('m.pdf',options,(err, htmlPages) => { if (err) { console.error('Conversion error: ' + err) } else { console.log(htmlPages) } }) ```...
error: Command failed: java -jar D:\vscodeProject\node\myserve\node_modules\pdf2html\vendor\tika-app-2.4.0.jar --html ../docs/9.5-9.17.pdf Error: Registry key 'Software\JavaSoft\Java Runtime Environment'\CurrentVersion' has value '1.8', but '1.7' is required. Error: could not find java.dll Error: Could not find...
Is it possible to pass a buffer into the converter?
Hi, I'm getting this error : `.../[email protected]/node_modules/pdf2html postinstall: throw new Error(`Failed downloading dependency ${filename}.`); .../[email protected]/node_modules/pdf2html postinstall: ^ .../[email protected]/node_modules/pdf2html postinstall: Error: Failed downloading dependency tika-app-2.6.0.jar. .../[email protected]/node_modules/pdf2html postinstall: at ClientRequest. (/builds/infrastructure/applications_slack/patchs_data/millenium/node_modules/.pnpm/[email protected]/node_modules/pdf2html/postinstall.js:27:23) .../[email protected]/node_modules/pdf2html...
Hi, https://github.com/shebinleo/pdf2html/blame/79f4eff672ad924094acf971687ecf75f108385b/index.js#L22 By not listening to this event, the stderr filled up with warnings about fonts that were not installed and, according to this [issue](https://github.com/nodejs/node-v0.x-archive/issues/6764), this is why my processing...
Hi, pdf2html.pages & pdf2html.html only returns paragraphs & links, but no images. Is this possible? Are there any workarounds available to get images (in the right order between other textual...
Regarding this in the docs, "Java runtime environment (JRE) is required to run this module." Next.js and/or serverless backends are very popular right now. However, it might be very hard...