PyMuPDF icon indicating copy to clipboard operation
PyMuPDF copied to clipboard

ComboBox choice_values full of empty strings despite PDF having valid choices.

Open sarahkittyy opened this issue 1 year ago • 3 comments

Description of the bug

I am using the 940b: https://www.irs.gov/pub/irs-pdf/f940b.pdf

The PDF file has identical pages, and each page has this specific dropdown: image

The choice_values variable is empty.

import pymupdf

pdf = pymupdf.open('f940b.pdf')

for page in pdf:
    for widget in page.widgets():
        if widget.field_type_string == 'ComboBox':
            print(widget.choice_values)
        widget.update()
pdf.save('f940b-output.pdf')

Expected output:

[' - Select One - ', '  ', 'Cincinnati, OH 45999', 'Memphis, TN 37501', 'Ogden, UT 84201', 'Philadelphia, PA 19255']
[' - Select One - ', '  ', 'Cincinnati, OH 45999', 'Memphis, TN 37501', 'Ogden, UT 84201', 'Philadelphia, PA 19255']

Actual output:

['', '', '', '', '', '']
[' - Select One - ', '  ', 'Cincinnati, OH 45999', 'Memphis, TN 37501', 'Ogden, UT 84201', 'Philadelphia, PA 19255']

This also affects the resulting f940b-output.pdf, where the first combo box is suddenly completely empty with no choices available. image

How to reproduce the bug

See above

PyMuPDF version

1.24.13

Operating system

Linux

Python version

3.12

sarahkittyy avatar Dec 04 '24 21:12 sarahkittyy

<<
  /Rect [ 213.206 341.196 391.491 361.361 ]
  /Subtype /Widget
  /Parent 86 0 R
  /F 4
  /P 505 0 R
  /StructParent 42
  /Type /Annot
  /MK <<
    /BG [ 1 ]
  >>
  /AP <<
    /N 533 0 R
  >>
>>
<<
  /Rect [ 213.206 341.196 391.491 361.361 ]
  /Subtype /Widget
  /TU (Mail to:)
  /Parent 86 0 R
  /F 4
  /I 47 0 R
  /P 1 0 R
  /StructParent 62
  /V 36 0 R
  /DA (/Helv 12 Tf 0 g)
  /DV 51 0 R
  /Opt 52 0 R
  /Type /Annot
  /Ff 4325376
  /MK <<
    /BG [ 1 ]
  >>
  /AP <<
    /N 45 0 R
  >>
>>

It seems that in this form, the dropdown on the first page has no /Opt key, only the one on the second page. Yet, in all PDF viewers, the options are shown in both dropdowns as expected. What other key is being used to link to these choices?

sarahkittyy avatar Dec 04 '24 21:12 sarahkittyy

These forms both have a /Parent 86 xref that links to

<<
  /TU (Mail to:)
  /I 87 0 R
  /T (p1-t14)
  /V 88 0 R
  /DA (/Helv 12 Tf 0 g)
  /DV 89 0 R
  /Opt 90 0 R
  /FT /Ch
  /Ff 4325376
  /Kids [ 532 0 R 46 0 R ]
>>

And /Opt 90 links to the correct list of options:

[ ( - Select One - ) (  ) (Cincinnati, OH 45999) (Memphis, TN 37501)
  (Ogden, UT 84201) (Philadelphia, PA 19255) ]

So somewhere in pymupdf you need to account for the fact that the /Opt key might live in the /Parent object.

sarahkittyy avatar Dec 04 '24 21:12 sarahkittyy

Also, widget.field_value = widget.choice_values[foo] doesn't even work. (Leaves the field in the output PDF completely blank)

sarahkittyy avatar Dec 04 '24 21:12 sarahkittyy