pdfly Compressed pdf larger than original?

$ pdfly compress in.pdf out.pdf
Original Size  : 1,996,123
Compressed Size: 2,014,972 (100.9% of original)

How is this possible?

Jun 05 '24 02:06 eashalm

Please complete with test code, input file and output file Like this, we can not do any review

Jun 05 '24 05:06 pubpub-zz

Please complete with test code, input file and output file Like this, we can not do any review

I cannot provide the input and output files as they contain sensitive personal information. Just try it out with some PDFs on your computer and you'll see that the compress command is broken.

Jun 05 '24 05:06 eashalm

I am having the same issue with multiple pdf files.

$ pdfly compress Lockhart_2002_-_A_Mathematician\'s_Lament.pdf Lockhart_compressed.pdf
Ignoring wrong pointing object 0 0 (offset 0)
Ignoring wrong pointing object 91 0 (offset 0)
Ignoring wrong pointing object 93 0 (offset 0)
Original Size  : 400,277
Compressed Size: 418,320 (104.5% of original)

Lockhart_2002_-_A_Mathematician's_Lament.pdf Lockhart_compressed.pdf

Another example:

$ pdfly compress Example_form.pdf Output.pdf 
Original Size  : 95,569
Compressed Size: 103,325 (108.1% of original)

Strangely, trying to compress the output of this form reduces the size, although it is still larger than the original:

$ pdfly compress Output.pdf Out2.pdf
Original Size  : 103,325
Compressed Size: 98,634 (95.5% of original)

Example_form.pdf Output.pdf Out2.pdf

Jul 18 '24 19:07 JellyJoe198

these cases are possible. The compression applies a loss-less compression on streams but some other solution such as building streams of object could reduce size too. However pypdf currently has no capability to build such streams and define a strategy to compress them. The only easy solution I could currently image would be to write the output into a stream compare size and if greater than the original just return the original file. If this sounds good to you, do not hesitate to propose a PR

Jul 18 '24 19:07 pubpub-zz

The only easy solution I could currently imagine would be to write the output into a stream compare size and if greater than the original just return the original file. If this sounds good to you, do not hesitate to propose a PR

I think that's a good idea 🙂 We could also indicate the final file size and the compression performa (80% ? 10%?) in pdfly compress output.

A PR implementing this would be welcome 🙂

Oct 10 '25 12:10 Lucas-C

This may be scope creep, but I was hoping there was an option for a lossy compression algorithm (jpeg?) on any embedded images in addition to PDF compression. So it could get the file size down for online forms with strict size requirements, especially if there's a lot of pictures.

On Thu, Jul 18, 2024, 14:30 pubpub-zz @.***> wrote:

these cases are possible. The compression applies a loss-less compression on streams but some other solution such as building streams of object : reduce size too. However pypdf currently has no capability to build such streams and define a strategy to compress them. The only easy solution I could currently image would be to write the output into a stream compare size and if greater than the original just return the original file. If this sounds good to you, do not hesitate to propose a PR

— Reply to this email directly, view it on GitHub https://github.com/py-pdf/pdfly/issues/52#issuecomment-2237376678, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJM5ASUTAQUHLOSCICT7F6LZNAJVHAVCNFSM6AAAAABIZZL77CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZXGM3TMNRXHA . You are receiving this because you commented.Message ID: @.***>

Oct 11 '25 14:10 JellyJoe198

Hey @Lucas-C!

I've implemented a fix for the compression issue where files could end up larger than the original. #173

Approach: Added size comparison logic that writes to memory first, compares compressed vs original size, and keeps the smaller version. Also fixed metadata preservation during compression.

Improvements:

✅ Prevents file size increases (addresses the 104.5% issue)
✅ Better user feedback with compression metrics
✅ Preserves PDF metadata (title, author, etc.)
✅ Comprehensive test coverage

Could you please assign a Hacktoberfest label to my PR? Thanks!

Oct 12 '25 10:10 Kaos599