PyMuPDF icon indicating copy to clipboard operation
PyMuPDF copied to clipboard

apply_redactions causes part of the page content to be hidden / transparent

Open beeing opened this issue 1 year ago • 25 comments

Description of the bug

I'm adding a redaction region to a part of the PDF, but after calling apply_redactions(), one side of the entire page goes missing (opened in macOS Preview app or Safari).

Further inspection reveals that the text is not missing, as it is selectable and can be copied out properly. It is either the text has been masked / hidden, but I could not find out how to check further (sorry, my limited knowledge on PDF structure).

Untitled

The media, crop, art, bleed, trim boxes all looks fine before and after the redactions. In fact, I'm trying to check if there's other paths, objects that may be causing it but there's nothing.

Note that I'm not able to share the actual PDF but it was generated from Puppeteer / Chromium (PDF ver 1.7).

Thanks in advance for looking into this.

How to reproduce the bug

  1. Generate PDF from Chromium / Puppeteer
  2. Add redaction of any size eg. (0,0,1,1) and call apply_redactions()
  3. Open the PDF in Preview App.

PyMuPDF version

1.24.9

Operating system

MacOS

Python version

3.9

beeing avatar Aug 05 '24 13:08 beeing

I faced the same problem. Note: this problem does not exist on version 1.23.9, all higher versions have it

There is also an interesting thing. If you open redacted PDF via Chrome or LibreOffice it will look as expected. But the issue is reproducible at least with Mac Preview and react-pdf-viewer lib.

Kyrylo-Hrytsenko avatar Aug 05 '24 14:08 Kyrylo-Hrytsenko

As always: please provide an example PDF! There is no way to otherwise deal with this post.

JorjMcKie avatar Aug 06 '24 05:08 JorjMcKie

I've just tested and it works on older version (up to pymupdf-1.23.26).

Perhaps easier to compare the commits since https://github.com/pymupdf/PyMuPDF/commit/a868c0a556e39198549e4e139534bb12b2623c5d until HEAD.

beeing avatar Aug 06 '24 14:08 beeing

@JorjMcKie Here is an example original.pdf redacted.pdf

Please, open the redacted file with the Preview app, it will look like this Screenshot 2024-08-06 at 3 38 40 PM

The code for redaction looks like this:

pdfIn = fitz.open(input_file)

out_buffer = BytesIO()

page = pdfIn[0]

page.add_redact_annot([0,0,100,100], text=None, fill=(0, 0, 0))
page.add_redact_annot([100,100,200,200], text=None, fill=(0, 0, 0))
page.add_redact_annot([200,200,300,300], text=None, fill=(0, 0, 0))
page.add_redact_annot([300,300,400,400], text=None, fill=(0, 0, 0))

page.apply_redactions()

pdfIn.save(out_buffer, garbage=3, deflate=True)
pdfIn.close()

with open(output_file, mode='wb') as f:
    f.write(out_buffer.getbuffer())
f.close()

Kyrylo-Hrytsenko avatar Aug 06 '24 14:08 Kyrylo-Hrytsenko

@JorjMcKie May I ask if you have received the PDF for reproducing the issue?

Kyrylo-Hrytsenko avatar Aug 16 '24 09:08 Kyrylo-Hrytsenko

@Kyrylo-Hrytsenko Thanks, I did.

I executed the script and found no problem at all using v1.24.9. I modified the script somewhat so redaction rectangles are visible and erased areas are not filled:

import pymupdf

print(pymupdf.version)
pdfIn = pymupdf.open("original.pdf")

page = pdfIn[0]
rects = (
    [0, 0, 100, 100],
    [100, 100, 200, 200],
    [200, 200, 300, 300],
    [300, 300, 400, 400],
)
for r in rects:
    page.draw_rect(r, color=(1, 0, 0))
    page.add_redact_annot(r)

page.apply_redactions()

pdfIn.ez_save("output.pdf")

Gives this correct result: output.pdf

JorjMcKie avatar Aug 16 '24 10:08 JorjMcKie

@JorjMcKie Your result file looks like this for me in the Preview app: Screenshot 2024-08-16 at 1 59 20 PM

Notes:

  • I highlighted some text with the mouse to show that it is still present in the PDF, but for some reason, it's hidden.
  • There is no issue when I open this file in Chrome or any other app.
  • I checked version 1.23.9, and there was no such issue. It seems to have started happening in version 1.24.0.

Does the output.pdf look normal when you open it in the 'Preview' application?

Kyrylo-Hrytsenko avatar Aug 16 '24 11:08 Kyrylo-Hrytsenko

I do not use or have Preview. My file is displayed in all PDF viewers like Adobe Acrobat, Foxit, Nitro, PDF XChange, evince (Linux). So all authoritative applications behave correctly. No idea what is wrong with Preview.

JorjMcKie avatar Aug 16 '24 12:08 JorjMcKie

Can confirm also see this problem in Preview. However it is fine when I open in Adobe Acrobat. To me this feels like a Preview rendering bug. I would submit a bug to Apple if that is possible!

jamie-lemon avatar Aug 16 '24 12:08 jamie-lemon

@jamie-lemon absolutely correct! I was about to write a similar comment. We will now close this issue.

JorjMcKie avatar Aug 16 '24 12:08 JorjMcKie

@jamie-lemon @JorjMcKie I don't think it's Preview bug only, here is why:

  • For me, it happens not only with Preview but also with the react-pdf-viewer library at least
  • With an older version of your library everything works fine, which means something was changed and caused this issue
  • Original files (before redaction) render correctly with Preview and with react-pdf-viewer, which means something in the redaction process causes this issue.

Kyrylo-Hrytsenko avatar Aug 16 '24 12:08 Kyrylo-Hrytsenko

@jamie-lemon @JorjMcKie I don't think it's Preview bug only, here is why:

  • For me, it happens not only with Preview but also with the react-pdf-viewer library at least
  • With an older version of your library everything works fine, which means something was changed and caused this issue
  • Original files (before redaction) render correctly with Preview and with react-pdf-viewer, which means something in the redaction process causes this issue.

@JorjMcKie @jamie-lemon In addition to Mac Preview, Safari, UPDF, and PDF Expert also fail to display output.pdf correctly.

yuhuang-cst avatar Aug 16 '24 13:08 yuhuang-cst

This is a strange bug - I thought it might be related to the content on page 1, but if I simplify things, target the 2nd page with an area redaction with:

import pymupdf

print(pymupdf.version)
pdfIn = pymupdf.open("orginal.pdf")

page = pdfIn[1] #2nd page
rects = (
    [0, 0, 100, 100],

)
for r in rects:
    page.draw_rect(r, color=(1, 0, 0))
    page.add_redact_annot(r)

page.apply_redactions()

pdfIn.ez_save("redacted.pdf")

Then I get: Screenshot 2024-08-16 at 18 05 59

I also noticed that it doesn't matter how big the area redaction, I could do this:

rects = (
    [0, 0, 0, 0],
)

And achieve the same resulting problem with the left hand side of the page. I could also put that rect anywhere on the page - it didn't have to be in the top left.

Testing with other documents, redacting and viewing in Preview I don't find this issue at all, so I think there must be something very specific to this document which will need further research.

jamie-lemon avatar Aug 16 '24 17:08 jamie-lemon

@jamie-lemon

so I think there must be something very specific to this document which will need further research.

Totally agree. Only a small number of my documents have this bug. I didn't even plan to write to you but then noticed that this is happening not only to me and the bug was already created, so I added my example as well.

Kyrylo-Hrytsenko avatar Aug 16 '24 18:08 Kyrylo-Hrytsenko

@Kyrylo-Hrytsenko Much appreciated!

jamie-lemon avatar Aug 16 '24 20:08 jamie-lemon

This is the simplest case I could find - I made this PDF in Adobe Acrobat, then took it into Preview and then did "Export" as a new PDF.

preview-made.pdf

When you redact with PyMuPDF the logo disappears when you view it in Preview, e.g.

Screenshot 2024-08-16 at 21 24 35

jamie-lemon avatar Aug 16 '24 20:08 jamie-lemon

So it seems if the PDF is made in Preview then this might have something to do with the problem.

jamie-lemon avatar Aug 16 '24 20:08 jamie-lemon

The issue I am encountering is that if apply_redactions is used, the vector graphics on the page all move to the bottom left corner in Preview, Safari, UPDF, and PDF Expert, whereas they display correctly in Chrome, Adobe Acrobat Reader, and WPS. Here is the code:

import fitz
doc = fitz.open('origin.pdf')
page = doc.load_page(0)
page.add_redact_annot((0, 0, 0 ,0), fill=False)
page.apply_redactions()
doc.ez_save('apply_redaction.pdf')
doc.close()

origin.pdf apply_redaction.pdf image

The origin.pdf is from the second page of the AlphaGo paper: https://www.nature.com/articles/nature16961

yuhuang-cst avatar Aug 17 '24 03:08 yuhuang-cst

The PDF generated with PyMuPDF version 1.23.26 displays the vector graphics correctly in Preview (although the image in the top right corner is partially missing). However, starting from version 1.24.0, there is a bug where the vector graphics are moved to the bottom left corner. apply_redaction_1.23.26.pdf apply_redaction_1.24.0.pdf

yuhuang-cst avatar Aug 17 '24 03:08 yuhuang-cst

It seems that primarily Mac-based tools have problems with redacted PDFs that have been created with Preview. I am experimenting with the MuPDF development version 1.25.0. The current PyMuPDF v1.24.9 uses MuPDF v1.24.8.

When creating and applying annotations using PyMuPDF 1.24.9 with MuPDF 1.25.0 I do no longer see the error using the Firefox browser - which does behave awkwardly as all those Mac apps.

I am attaching the produced output.pdf inviting Mac users to access it with their Preview on Mac: output.pdf

JorjMcKie avatar Aug 17 '24 12:08 JorjMcKie

It seems that primarily Mac-based tools have problems with redacted PDFs that have been created with Preview. I am experimenting with the MuPDF development version 1.25.0. The current PyMuPDF v1.24.9 uses MuPDF v1.24.8.

When creating and applying annotations using PyMuPDF 1.24.9 with MuPDF 1.25.0 I do no longer see the error using the Firefox browser - which does behave awkwardly as all those Mac apps.

I am attaching the produced output.pdf inviting Mac users to access it with their Preview on Mac: output.pdf

image It seems that this bug still exists in Mac Preview.

yuhuang-cst avatar Aug 17 '24 12:08 yuhuang-cst

@yuhuang-cst thanks for the feedback anyway

JorjMcKie avatar Aug 17 '24 12:08 JorjMcKie

Can also confirm that the bug doesn't exist in PyMuPDF version 1.23.9

jamie-lemon avatar Aug 17 '24 12:08 jamie-lemon

I have submitted a problem report in MuPDF's system here:https://bugs.ghostscript.com/show_bug.cgi?id=707966

JorjMcKie avatar Aug 21 '24 12:08 JorjMcKie

Report the same issue。

K8S666 avatar Sep 27 '24 07:09 K8S666

Apparent MuPDF master branch has a fix, so PyMuPDF itself will be fixed when we make a release with MuPDF-1.25.0.

However i don't currently know when MuPDF_1.25.0 will be released.

@julian-smith-artifex-com @JorjMcKie Let's release soon to hopefully fix this one! Related I think: https://github.com/pymupdf/PyMuPDF/discussions/4029

jamie-lemon avatar Nov 06 '24 21:11 jamie-lemon

Fixed in PyMuPDF-1.25.0.