Wanted for testing: PDF files with specific features
Some of the unit tests I have developed rely on PDF files that have certain features. In Calibre, I own a collection of 109+ PDF books, but amongst them I haven't met any that satisfy certain needs. In particular, I'm looking for:
- A PDF file with a
/ASCIIHexDecode, or equivalently/AHxstream filter. - A PDF file with a
/JPXDecodestream filter. - More PDF files whose objects have
/Typeequal to/ObjStm, that is to say files that rely on Cross-Reference streams (PDF 1.5+). - A few other hybrid-reference files, as described in section
7.5.8.4of ISO 32000: files that use a Cross-Reference Table to hide elements stored in a Cross-Reference Stream, understandable by PDF 1.5+ readers only.
The reason of this request is to satisfy the fixture data collection (in tests/fixture_data/ of my current PR #14) of the project. It seems a rarity to find a PDF file with these characteristics and I ask you.
I have performed my searches with a simple grep. For example, in case 2 I went like so:
grep -RPi --binary-files=text [--exclude-dir=<whatever you want>] "/JPXDecode" <arbitrary path>
1. Files containing "/ASCIIHexDecode"
I found three of these, all scans of books I got from university that someone else created, but I don't feel comfortable publishing them. I could send them to you over a private channel if that would be any help.
asciihexdecode.pdf is one page that I extracted from one of these documents using pdfarranger with PyPdf2.
2. Files containing "/JPXDecode"
- https://www.elwis.de/DE/Sportschifffahrt/Sportbootfuehrerscheine/Navigationsaufgaben-SKS.pdf?__blob=publicationFile&v=3
- https://www.cs.uni-mainz.de/files/2018/02/00-LV-Info-SS2018-speicheropt-3.pdf
- I have more files, but I don't want to publish them.
3. /Type equal to /ObjStm
How can I grep for those?
The 00-LV-Info-SS2018-speicheropt-3.pdf contains <</Filter/FlateDecode/First 14/Length 343/N 2/Type/ObjStm>>stream, is this sufficient?
4. Other hypbrid-reference files
How would I identify those if I had them?