pdf-lib icon indicating copy to clipboard operation
pdf-lib copied to clipboard

Broken output PDF in Adobe Acrobat DC

Open CleyFaye opened this issue 4 years ago • 0 comments

What were you trying to do?

We have a process that adds a few lines of text overlay on existing PDF documents. It is a very basic process that don't touch much of the existing content.

Attached is a PDF input that triggers the behavior described below: in.pdf

However the issue at hand can be reproduced without even touching the content; the rest of this issue will be based on merely opening and saving the file using pdf-lib.

How did you attempt to do it?

The minimal example boils down to simply opening and saving a specific PDF input:

import {readFileSync, writeFileSync} from "fs";
import {PDFDocument} from "pdf-lib"

const document = await PDFDocument.load(readFileSync("in.pdf"));
writeFileSync("broken.pdf", await document.save());
writeFileSync("ok.pdf", await document.save({useObjectStreams: false}));

This was done using pdf-lib v1.17.1 on Node v16.14.2

What actually happened?

The saved output with the default values for save() can not be opened in Acrobat Reader DC (with an unhelpful "an error occured and file cannot be opened or repaired" kind of message).

The one saved with useObjectStreams: false is fine.

What did you expect to happen?

Both output file should be fine.

How can we reproduce the issue?

As said above, using the provided input PDF (which is opened correctly in most tools I tested it with, including Acrobat Reader DC, Chrome PDF viewer, Firefox, etc.) and a simple script that load and save the PDF produces a broken output.

The full testing process:

  • create a new npm project in an empty directory (npm init)
  • install pdf-lib (npm install pdf-lib)
  • copy "in.pdf" in the project's directory
  • create a test script with the content below
  • run node test
  • output named "broken.pdf" can not be opened on Acrobat Reader DC on windows
// content of test.js
const {readFileSync, writeFileSync} = require("fs");
const {PDFDocument} = require("pdf-lib");

const main = async () => {
  const document = await PDFDocument.load(readFileSync("in.pdf"));
  writeFileSync("broken.pdf", await document.save());
  writeFileSync("ok.pdf", await document.save({useObjectStreams: false}));
}

main().catch(console.error);

Version

1.17.1

What environment are you running pdf-lib in?

Browser, Node

Checklist

  • [X] My report includes a Short, Self Contained, Correct (Compilable) Example.
  • [X] I have attached all PDFs, images, and other files needed to run my SSCCE.

Additional Notes

The PDF document is produced using MS PowerPoint. The example input provided that breaks was cut down as much as possible while still producing the issue.

While saving the document without using object streams do work, since this is not the default option I believe it would be best to find out what's happening here.

CleyFaye avatar Apr 29 '22 09:04 CleyFaye