pypdf PDF-Form not editable after filling out text field (after upgrade from 3.9.* to 4.3*)

Last year i wrote a small python program that allows me to fill out a PDF-Form and everything worked as i expected it. After running the program i was able to review the created file and even change the contents. As i only use this program once a year i don't follow all the changes to pypdf closely. When i revisited my small program this year i noticed that several updates to pypdf exists, as i dont like to run outdated software i upgraded to the latest version, updated my code according to the documentation and rerun my program.

The good news first the form gets filled out, but when i open the filled out pdf i get a warning that the "extended features" (see attached screenshot) are no longer available. I could live with that, because it is just annoying, but I'm also not able to edit the contents of the PDF anymore which is a problem.

grafik

I've tried with multiple version of pypdf and as far as i know somewhere between 3.9 and 3.11 a change was made that causes this behavior. I've also attached the pdfs created by the different pypdf version:

f5471sm-3.9.1.pdf f5471sm-4.3.1.pdf

Environment

$ python -m platform
Windows-10-10.0.22631-SP0

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.3.1, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=none

Code + PDF

For demonstration purposes I've boiled the code down as much as possible.

This is form I'm using: https://www.irs.gov/pub/irs-pdf/f5471sm.pdf

import pypdf
from pypdf import PdfReader
from pypdf import PdfWriter

form = PdfReader("f5471sm.pdf")
fields = form.get_form_text_fields()
writer = PdfWriter()

for key,field in fields.items():
    fields[key] = key

if int(pypdf.__version__[0]) >= 4:
    writer.clone_reader_document_root(form)
    writer.update_page_form_field_values(None, fields)
else:
    for page in form.pages:
        writer.add_page(page)
    for page in writer.pages:
        writer.update_page_form_field_values(page, fields)

with open("f5471sm-"+pypdf.__version__+".pdf","wb") as file:
    writer.write(file)

writer.close()

Aug 01 '24 08:08 ljbergmann

Your code in v3.9 is not valid as you are not transfering the Acroform. doing this you are loosing the form /field extraction capabilit

The PDF you are using contains an XFA and seems signed. I need more time to understand how this could be handle to prevent the warning reported

Aug 04 '24 11:08 pubpub-zz

I'm also trying to figure out how to handle forms and signed PDFs to prevent the warning and ensure proper form field extraction.

Aug 04 '24 12:08 Harry262000

Your code in v3.9 is not valid as you are not transfering the Acroform. doing this you are loosing the form /field extraction capabilit

The PDF you are using contains an XFA and seems signed. I need more time to understand how this could be handle to prevent the warning reported

Thank you very much for you input @pubpub-zz , maybe the code is not valid / does not use the lib as one should, but - and i just say this to explain why i posted this code - it gave me the results i was trying to get. As i stated the warning is not nessesaryl a big deal but not being able to change the content is a bit of a bummer, because everything else works perfectly fine.

Aug 04 '24 13:08 ljbergmann

just for archive: f5471sm.pdf

Aug 29 '24 08:08 pubpub-zz

document can now be written in incremental However, in order to get the data visible however you still need to modify dataset in the XFA form (tracked in https://github.com/py-pdf/pypdf/issues/2824)

Sep 01 '24 15:09 pubpub-zz