icepdf icon indicating copy to clipboard operation
icepdf copied to clipboard

Searching Issue in icepdf viewer

Open Muhammad-Muddasir opened this issue 2 years ago • 7 comments

Hi, I'm using ice pdf-version 7.0.2 in my project but some reports searching not working properly.

Searching Issue

Searching Issue.pdf

Some warnings are shown

Mar 26, 2023 11:12:42 AM org.icepdf.core.pobjects.Document setInputStream
WARNING: Cross reference deferred loading failed, will fall back to linear reading.
Mar 26, 2023 11:12:42 AM org.icepdf.core.pobjects.Catalog <clinit>
INFO: ICEpdf Core 7.0.2
Mar 26, 2023 11:12:44 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '    ' which goes past the file size; offset: 1146308935, size: 6876, font size: 1689704
Mar 26, 2023 11:12:46 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table 'DSIG' which goes past the file size; offset: 9517144, size: 65536, font size: 9524020
Mar 26, 2023 11:12:46 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table 'DSIG' which goes past the file size; offset: 9733700, size: 65536, font size: 9740576
Mar 26, 2023 11:12:47 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '   $' which goes past the file size; offset: 668, size: 1146308935, font size: 27506260
Mar 26, 2023 11:12:48 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '   $' which goes past the file size; offset: 604, size: 1146308935, font size: 36791212
Mar 26, 2023 11:12:48 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '   $' which goes past the file size; offset: 780, size: 1146308935, font size: 9209540
Mar 26, 2023 11:12:48 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '    ' which goes past the file size; offset: 1146308935, size: 6880, font size: 21482296
Mar 26, 2023 11:12:48 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '    ' which goes past the file size; offset: 1146308935, size: 6880, font size: 14437520
Mar 26, 2023 11:12:48 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '    ' which goes past the file size; offset: 1146308935, size: 6880, font size: 11106464
Mar 26, 2023 11:12:49 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '    ' which goes past the file size; offset: 1146308935, size: 6772, font size: 10080360
Mar 26, 2023 11:12:49 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '    ' which goes past the file size; offset: 1146308935, size: 6880, font size: 21632712
Mar 26, 2023 11:12:49 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '    ' which goes past the file size; offset: 1146308935, size: 6880, font size: 14460368
Mar 26, 2023 11:12:49 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '    ' which goes past the file size; offset: 1146308935, size: 6880, font size: 12058672
Mar 26, 2023 11:12:49 AM org.apache.fontbox.ttf.CmapSubtable processSubtype14
WARNING: Format 14 cmap table is not supported and will be ignored
Mar 26, 2023 11:12:49 AM org.apache.fontbox.ttf.CmapSubtable processSubtype14
WARNING: Format 14 cmap table is not supported and will be ignored
Mar 26, 2023 11:12:50 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '    ' which goes past the file size; offset: 1146308935, size: 6880, font size: 18259888
Mar 26, 2023 11:12:50 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '  �' which goes past the file size; offset: 1146308935, size: 6876, font size: 941112
Mar 26, 2023 11:12:50 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '  �' which goes past the file size; offset: 1146308935, size: 6876, font size: 929364
Mar 26, 2023 11:12:50 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '  �' which goes past the file size; offset: 1146308935, size: 6876, font size: 994664
Mar 26, 2023 11:12:50 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '  �' which goes past the file size; offset: 1146308935, size: 6876, font size: 984412
Mar 26, 2023 11:12:50 AM org.apache.fontbox.ttf.CmapSubtable processSubtype14
WARNING: Format 14 cmap table is not supported and will be ignored
Mar 26, 2023 11:12:50 AM org.apache.fontbox.ttf.CmapSubtable processSubtype14
WARNING: Format 14 cmap table is not supported and will be ignored
Mar 26, 2023 11:12:51 AM org.apache.fontbox.ttf.CmapSubtable processSubtype14
WARNING: Format 14 cmap table is not supported and will be ignored
Mar 26, 2023 11:12:51 AM org.apache.fontbox.ttf.CmapSubtable processSubtype14
WARNING: Format 14 cmap table is not supported and will be ignored
Mar 26, 2023 11:12:51 AM org.apache.fontbox.ttf.CmapSubtable processSubtype14
WARNING: Format 14 cmap table is not supported and will be ignored
Mar 26, 2023 11:12:51 AM org.apache.fontbox.ttf.CmapSubtable processSubtype14
WARNING: Format 14 cmap table is not supported and will be ignored
Mar 26, 2023 11:12:52 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '    ' which goes past the file size; offset: 1146308935, size: 6876, font size: 1689704
Mar 26, 2023 11:12:53 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table 'DSIG' which goes past the file size; offset: 9517144, size: 65536, font size: 9524020
Mar 26, 2023 11:12:53 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table 'DSIG' which goes past the file size; offset: 9733700, size: 65536, font size: 9740576
Mar 26, 2023 11:12:53 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '   $' which goes past the file size; offset: 668, size: 1146308935, font size: 27506260
Mar 26, 2023 11:12:53 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '   $' which goes past the file size; offset: 604, size: 1146308935, font size: 36791212
Mar 26, 2023 11:12:53 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '   $' which goes past the file size; offset: 780, size: 1146308935, font size: 9209540
Mar 26, 2023 11:12:54 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '    ' which goes past the file size; offset: 1146308935, size: 6880, font size: 21482296
Mar 26, 2023 11:12:54 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '    ' which goes past the file size; offset: 1146308935, size: 6880, font size: 14437520
Mar 26, 2023 11:12:54 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '    ' which goes past the file size; offset: 1146308935, size: 6880, font size: 11106464
Mar 26, 2023 11:12:54 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '    ' which goes past the file size; offset: 1146308935, size: 6772, font size: 10080360
Mar 26, 2023 11:12:54 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '    ' which goes past the file size; offset: 1146308935, size: 6880, font size: 21632712
Mar 26, 2023 11:12:54 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '    ' which goes past the file size; offset: 1146308935, size: 6880, font size: 14460368
Mar 26, 2023 11:12:54 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '    ' which goes past the file size; offset: 1146308935, size: 6880, font size: 12058672
Mar 26, 2023 11:12:54 AM org.apache.fontbox.ttf.CmapSubtable processSubtype14
WARNING: Format 14 cmap table is not supported and will be ignored
Mar 26, 2023 11:12:54 AM org.apache.fontbox.ttf.CmapSubtable processSubtype14
WARNING: Format 14 cmap table is not supported and will be ignored
Mar 26, 2023 11:12:55 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '    ' which goes past the file size; offset: 1146308935, size: 6880, font size: 18259888
Mar 26, 2023 11:12:55 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '  �' which goes past the file size; offset: 1146308935, size: 6876, font size: 941112
Mar 26, 2023 11:12:55 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '  �' which goes past the file size; offset: 1146308935, size: 6876, font size: 929364
Mar 26, 2023 11:12:55 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '  �' which goes past the file size; offset: 1146308935, size: 6876, font size: 994664
Mar 26, 2023 11:12:55 AM org.apache.fontbox.ttf.TTFParser parse
WARNING: Skip table '  �' which goes past the file size; offset: 1146308935, size: 6876, font size: 984412
Mar 26, 2023 11:12:55 AM org.apache.fontbox.ttf.CmapSubtable processSubtype14
WARNING: Format 14 cmap table is not supported and will be ignored
Mar 26, 2023 11:12:55 AM org.apache.fontbox.ttf.CmapSubtable processSubtype14
WARNING: Format 14 cmap table is not supported and will be ignored
Mar 26, 2023 11:12:55 AM org.apache.fontbox.ttf.CmapSubtable processSubtype14
WARNING: Format 14 cmap table is not supported and will be ignored
Mar 26, 2023 11:12:55 AM org.apache.fontbox.ttf.CmapSubtable processSubtype14
WARNING: Format 14 cmap table is not supported and will be ignored
Mar 26, 2023 11:12:55 AM org.apache.fontbox.ttf.CmapSubtable processSubtype14
WARNING: Format 14 cmap table is not supported and will be ignored
Mar 26, 2023 11:12:55 AM org.apache.fontbox.ttf.CmapSubtable processSubtype14
WARNING: Format 14 cmap table is not supported and will be ignored
2
Mar 26, 2023 11:14:45 AM org.icepdf.core.pobjects.Document setInputStream
WARNING: Cross reference deferred loading failed, will fall back to linear reading.

Muhammad-Muddasir avatar Mar 26 '23 06:03 Muhammad-Muddasir

I think, partial searching is working sometimes, like when write first character is entered then search happens. But when two or more characters are write for search purpose searching disappear on the pdf page.

ctoabidmaqbool avatar Mar 27 '23 04:03 ctoabidmaqbool

This is an interestingly encoded PDF. The text selection and search code use the same base layout code to determine glyph order. The system property org.icepdf.core.views.page.text.spaceFraction=1 improves the situation a bit but the results still aren't ideal. I'll need to take a closer look to figure out what's happening here withe auto space detection code.

pcorless avatar Apr 01 '23 02:04 pcorless

Hi @pcorless, I'm using ice-pdf version com.github.pcorless.icepdf:icepdf-core:7.0.2 and com.github.pcorless.icepdf:icepdf-viewer:7.0.2, other I generated pdf file by itext pdf version com.itextpdf:itextpdf:5.5.13.2

Muhammad-Muddasir avatar Apr 01 '23 05:04 Muhammad-Muddasir

I finally got back to this issue. I have a hunch that the landscape layout is throwing off the text sorting code. This is a really good test case for dealing with text that is layed out using the y coord instead of x.

pcorless avatar May 12 '23 04:05 pcorless

Hi! Any Progress in this side, as searching in Very leandthy reports are very necessory feature!

I am facing issue in very latest icePdf library too e.g. 7.2.0.

The report generated using itextpdf 5.5.13.2. In both Potrait and Landscape same issue, I have handred of different report in my software still same issue in every report!

Should I have to make some sample repo to test the issue or this is alreay detected one!

ctoabidmaqbool1 avatar May 01 '24 08:05 ctoabidmaqbool1

Sorry I haven't looked at this one in a while. I'll try and make some time for it as I do have some new ideas on how to solve this that came out of the redaction work.

pcorless avatar May 03 '24 03:05 pcorless

As iText Pdf 5.x is very old one and 7.x I can't use due to license issue.

So I have trid to switch to foked version Open Pdf 2.x, which is still active and latest one!

In Open Pdf still same issue, e.g. Searching is not working fine, Also, Text selection is also not working fine!

image

Sample report saved through ice-pdf viewer (Orignal report is genered in memory)!

PurchaseReport-new.pdf

ctoabidmaqbool1 avatar May 06 '24 06:05 ctoabidmaqbool1

Any progress yet in this side, e.g. searching issue using reports generated by itextpdf 5.x latest or openpdf latest!

ctoabidmaqbool avatar Mar 27 '25 03:03 ctoabidmaqbool

Sorry, this got pushed back. I'll bring it forward as I came across this rotation issue just recently while looking at annotation not showing correctly.

pcorless avatar Mar 27 '25 04:03 pcorless

I've got something working that addresses the text sorting issues you saw when searching. There seems to also be a page lock issue that makes the search highlight flicker a bit or not show at all if the text selection tool isn't selected, looking into that now.

pcorless avatar Mar 29 '25 04:03 pcorless

Change has been merged into main and will be part of the 7.2.4 patch. Should be release by April 11.

pcorless avatar Apr 11 '25 03:04 pcorless

Please try to re-open this issue again, as it seems not be fixed yet, completely!

Testing repport: InvoiceDetailReport-new.pdf

I have tried, to search many words e.g. Shop, #12, +92-321-1234567, Sales, Customer which are not seems to be searching.

Image

The report is generated using. openpdf:2.0.3

and generated in ByteArrayOutputStream and then opend in icepdf-viewer:7.2.4_P01

my libs:

    implementation 'com.github.pcorless.icepdf:icepdf-core:7.2.4_P01'
    implementation 'com.github.pcorless.icepdf:icepdf-viewer:7.2.4_P01'
    implementation 'com.github.librepdf:openpdf:2.0.3'
    implementation 'org.apache.pdfbox:pdfbox:3.0.4'

I also try to reset settings:

regedit -> /HKEY_CURRENT_USER\SOFTWARE\JavaSoft\Prefs\org\icepdf\ri\util

ctoabidmaqbool avatar Apr 14 '25 06:04 ctoabidmaqbool

I'm not having a lot of luck reproducing that highlight issue on my system. Highlight seems to work as expected.

  • can you go to the advanced search and see if hits are found for 'sales'. You should see two hits in the search results. And of course two highlights for the terms on the page.
  • Can you post any system properties you might have enabled when running the library?

pcorless avatar Apr 15 '25 03:04 pcorless

@pcorless I have create project, you can test it, the issue of searching!

https://github.com/ctoabidmaqbool1/iTextPdf-5-And-ICEpdf-Viewer-Test/tree/searching-issue

Note: I also try to reset settings:

regedit -> /HKEY_CURRENT_USER\SOFTWARE\JavaSoft\Prefs\org\icepdf\ri\util

These these words are not working for me!

Shop, #12, +92-321-1234567, Sales, Customer, Sales, #, Bonus, Qty (under Bonus), etc.

Image

Image

ctoabidmaqbool avatar Apr 15 '25 05:04 ctoabidmaqbool

like you can see, Shop is not searching!

Image

But Mart is working for me!

Image

ctoabidmaqbool avatar Apr 15 '25 05:04 ctoabidmaqbool

What a strange bug. I've created a PR you can take a look at on your project. With these changes the search event fire as expected and I'm seeing what I'd see if was using the reference viewer.

pcorless avatar Apr 18 '25 05:04 pcorless

What a strange bug. I've created a PR you can take a look at on your project. With these changes the search event fire as expected and I'm seeing what I'd see if was using the reference viewer.

after aplying your changes in searching-issue https://github.com/ctoabidmaqbool1/iTextPdf-5-And-ICEpdf-Viewer-Test/commit/07edc32ec535e869a890b5a4faf6740f914c5b07, I am facing some more issues!

Image

Image

without SwingUtilities.invokeLater(() -> {}); viewer is open but same searching issue! @pcorless

ctoabidmaqbool avatar Apr 19 '25 09:04 ctoabidmaqbool