pdf-reader icon indicating copy to clipboard operation
pdf-reader copied to clipboard

The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe.

Results 73 pdf-reader issues
Sort by recently updated
recently updated
newest added

PDF with multiple columns doesn’t extract text properly When I tried to extract text in a PDF with 2 columns style. The text is read in a row by row...

For some reason `PDF::Reader#text` does not return all the text on a PDF file I'm scanning. Albeit I'm able to get the text by looking at the runs directly. Here...

I noticed this gem has problems parsing some PDFs where the text is not necessarily clean. For instance, this file: https://www.jstor.org/stable/3684663 Some parts of it get output like: "a b...

I am not sure if this is a problem with the pdf itself, but it seems like when mapping the `mean_character_width` from `@runs` in initialize of lib/pdf/reader/page_layout.rb that the width...

Trying to read this pdf --> [h7E6IP36VnmkCJjM3dWL_0.pdf](https://github.com/yob/pdf-reader/files/11969978/h7E6IP36VnmkCJjM3dWL_0.pdf) `PDF::Reader.new("h7E6IP36VnmkCJjM3dWL_0.pdf").page(1).text` Got PDF::Reader::MalformedPDFError (PDF malformed, expected 'endstream' but found '1' instead) Additional Info: gem "pdf-reader", "~> 2.11"

I have the following PDF, which is not encrypted, only locked for edits. [bad_decrypt.pdf](https://github.com/yob/pdf-reader/files/14349837/bad_decrypt.pdf) When trying to read it, it raises `OpenSSL::Cipher::CipherError: bad decrypt` error: ```ruby PDF::Reader.new("./bad_decrypt.pdf").pages /app/vendor/bundle/ruby/3.1.0/gems/pdf-reader-2.12.0/lib/pdf/reader/aes_v2_security_handler.rb:37:in `final': bad...

Hi there! I'm not a lawyer but `pdf-reader` is under MIT license (https://github.com/yob/pdf-reader/blob/main/MIT-LICENSE) and use `ttfunk` as a dependency which is under GPL2/GPL3 license (https://github.com/prawnpdf/ttfunk/blob/master/LICENSE). AFAIK mixing both is not...

We're using PDF::Reader at Zipline for parsing content out of PDFs. (I also forked this project on our team repo [here](https://github.com/retailzipline/pdf-reader).) We have a number of cases where we want...

[Pages-tree-refs.pdf](https://github.com/yob/pdf-reader/files/13798724/Pages-tree-refs.pdf) ([source](https://github.com/mozilla/pdf.js/blob/master/test/pdfs/Pages-tree-refs.pdf)) Running the following script with the attached PDF renders the following error: ```ruby require "bundler/inline" gemfile do gem "pdf-reader" end PDF::Reader.new("Pages-tree-refs.pdf").pages # /usr/local/bundle/gems/pdf-reader-2.12.0/lib/pdf/reader/reference.rb:65:in `hash': stack level too deep...

First of all, thanks for the work and effort you've put into this great library! ## Bug description We are having an issue with numerals not being read correctly by...