Fix: individual /Resources directory are now properly produced for each document page
This change causes a slight increase of PDF documents, but makes their structure more valid regarding the PDF spec.
Checklist:
-
[x] The GitHub pipeline is OK (green), meaning that both
pylint(static code analyzer) andblack(code formatter) are happy with the changes of this PR. -
[ ] A unit test is covering the code added / modified by this PR
-
[ ] This PR is ready to be merged
-
[ ] In case of a new feature, docstrings have been added, with also some documentation in the
docs/folder -
[x] A mention of the change is present in
CHANGELOG.md
By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.
Could you please review this @andersonhc or @gmischler?
Despite the 100+ reference PDF files modified, this PR onlys affects 2 source Python files and is relatively short.
There are some alternative approaches to the one currently used in this PR:
- instead of 3 extra
dicts (.fonts_used_per_page_number/.images_used_per_page_number/.graphics_style_names_per_page_number), we could store those informations as properties inPDFPageinstances (inFPDF.pages) - instead of storing this information at all, we could generalize the regex-based approach used in
FPDF.drawing_context()and perform those regex-matches inOutputProducerbased on the final.contentsof pages
I will try to make some time for a complete review by tomorrow. For now I am running some tests and it's interesting your PR makes it clear we are letting unused fonts end up in the output document.
An example:
from fpdf import FPDF
pdf = FPDF()
pdf.add_font("NotoSans", "B", "NotoSans-Bold.ttf")
pdf.add_font("NotoSans", "BI", "NotoSans-BoldItalic.ttf")
pdf.add_font("NotoSans", "I", "NotoSans-Italic.ttf")
pdf.add_font("NotoSans", "", "NotoSans-Regular.ttf")
pdf.set_font("NotoSans", "", 12)
pdf.single_resources_object = False
pdf.add_page()
pdf.multi_cell(w=pdf.epw, text="**Text in bold**", markdown=True)
pdf.add_page()
pdf.multi_cell(w=pdf.epw, text="__Text in italic__", markdown=True)
pdf.output("test1.pdf")
The page 1 will have "F1" and "F4" in the resources dictionary, and page 2 will have "F3" and "F4".
"F2" is added on the final document and not referenced at all. "F4" is added to the resource list because of set_font() although not used.
I have documents with many fallback fonts and there is a considerable amount of unused font data added to the documents.
Might be something to tackle in a future PR.
The page 1 will have "F1" and "F4" in the resources dictionary, and page 2 will have "F3" and "F4". "F2" is added on the final document and not referenced at all. "F4" is added to the resource list because of
set_font()although not used.I have documents with many fallback fonts and there is a considerable amount of unused font data added to the documents.
Might be something to tackle in a future PR.
I opened issue https://github.com/py-pdf/fpdf2/issues/1382 regarding this.