fpdf2 icon indicating copy to clipboard operation
fpdf2 copied to clipboard

Fix: individual /Resources directory are now properly produced for each document page

Open Lucas-C opened this issue 2 years ago • 3 comments

This change causes a slight increase of PDF documents, but makes their structure more valid regarding the PDF spec.

Checklist:

  • [x] The GitHub pipeline is OK (green), meaning that both pylint (static code analyzer) and black (code formatter) are happy with the changes of this PR.

  • [ ] A unit test is covering the code added / modified by this PR

  • [ ] This PR is ready to be merged

  • [ ] In case of a new feature, docstrings have been added, with also some documentation in the docs/ folder

  • [x] A mention of the change is present in CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.

Lucas-C avatar Mar 06 '24 11:03 Lucas-C

Could you please review this @andersonhc or @gmischler?

Despite the 100+ reference PDF files modified, this PR onlys affects 2 source Python files and is relatively short.

Lucas-C avatar Mar 06 '24 11:03 Lucas-C

There are some alternative approaches to the one currently used in this PR:

  • instead of 3 extra dicts (.fonts_used_per_page_number / .images_used_per_page_number / .graphics_style_names_per_page_number), we could store those informations as properties in PDFPage instances (in FPDF.pages)
  • instead of storing this information at all, we could generalize the regex-based approach used in FPDF.drawing_context() and perform those regex-matches in OutputProducer based on the final .contents of pages

Lucas-C avatar Mar 07 '24 08:03 Lucas-C

I will try to make some time for a complete review by tomorrow. For now I am running some tests and it's interesting your PR makes it clear we are letting unused fonts end up in the output document.

An example:

from fpdf import FPDF

pdf = FPDF()
pdf.add_font("NotoSans", "B", "NotoSans-Bold.ttf")
pdf.add_font("NotoSans", "BI", "NotoSans-BoldItalic.ttf")
pdf.add_font("NotoSans", "I", "NotoSans-Italic.ttf")
pdf.add_font("NotoSans", "", "NotoSans-Regular.ttf")

pdf.set_font("NotoSans", "", 12)
pdf.single_resources_object = False

pdf.add_page()
pdf.multi_cell(w=pdf.epw, text="**Text in bold**", markdown=True)

pdf.add_page()
pdf.multi_cell(w=pdf.epw, text="__Text in italic__", markdown=True)

pdf.output("test1.pdf")

The page 1 will have "F1" and "F4" in the resources dictionary, and page 2 will have "F3" and "F4". "F2" is added on the final document and not referenced at all. "F4" is added to the resource list because of set_font() although not used.

I have documents with many fallback fonts and there is a considerable amount of unused font data added to the documents.

Might be something to tackle in a future PR.

andersonhc avatar Mar 07 '24 20:03 andersonhc

The page 1 will have "F1" and "F4" in the resources dictionary, and page 2 will have "F3" and "F4". "F2" is added on the final document and not referenced at all. "F4" is added to the resource list because of set_font() although not used.

I have documents with many fallback fonts and there is a considerable amount of unused font data added to the documents.

Might be something to tackle in a future PR.

I opened issue https://github.com/py-pdf/fpdf2/issues/1382 regarding this.

Lucas-C avatar Mar 03 '25 09:03 Lucas-C