pdf-reader-turtletext icon indicating copy to clipboard operation
pdf-reader-turtletext copied to clipboard

`fuzzed_y': undefined method `keys' for #<Array:0...>

Open cppljevans opened this issue 7 years ago • 3 comments

The code is:

=begin
The following requires cp'ed from:
  https://github.com/tardate/pdf-reader-turtletext/blob/master/lib/pdf-reader-turtletext.rb
=end
require 'pdf-reader'
require 'pdf/reader/patch/object_hash'
require 'pdf/reader/positional_text_receiver'

require 'pdf/reader/turtletext'
require 'pdf/reader/turtletext/version'
require 'pdf/reader/turtletext/textangle'

=begin
The following from:
  "How to instantiate Turtletext in code"
  https://github.com/tardate/pdf-reader-turtletext
=end
pdf_filename = '../taxforms/f1065-2017.pdf'
reader = PDF::Reader::Turtletext.new(pdf_filename)
=begin
The following from:
  "How to extract text within a region described in relation to other text
  https://github.com/tardate/pdf-reader-turtletext
=end
textangle = reader.bounding_box do
  page 4
end
textangle.text

However, when run with my ruby2.3, it produces error the error in the subject line:

make -k
gem list pdf-reader

*** LOCAL GEMS ***

pdf-reader (1.4.0, 1.1.1)
pdf-reader-html (0.1.0)
pdf-reader-markup (0.0.1)
pdf-reader-turtletext (0.2.2)
ruby how_to_instantiate.rb
/var/lib/gems/2.3.0/gems/pdf-reader-turtletext-0.2.2/lib/pdf/reader/turtletext.rb:53:in `fuzzed_y': undefined method `keys' for #<Array:0x000000013d5058> (NoMethodError)
	from /var/lib/gems/2.3.0/gems/pdf-reader-turtletext-0.2.2/lib/pdf/reader/turtletext.rb:42:in `content'
	from /var/lib/gems/2.3.0/gems/pdf-reader-turtletext-0.2.2/lib/pdf/reader/turtletext.rb:87:in `text_in_region'
	from /var/lib/gems/2.3.0/gems/pdf-reader-turtletext-0.2.2/lib/pdf/reader/turtletext/textangle.rb:134:in `text'
	from how_to_instantiate.rb:28:in `<main>'
Makefile:4: recipe for target 'how_to_instantiate' failed

Is this a bug in pdf-reader-turtletext or am I at fault?

TIA.

cppljevans avatar Feb 21 '18 10:02 cppljevans

@cppljevans Please check out the fork from tkieley. I made a small patch so it will work with PDF Reader > 1.2

MatthewSuttles avatar Mar 02 '18 14:03 MatthewSuttles

Thanks @MatthewSuttles; I cloned the tkieley fork and used -I<patch> in the ruby command where <patch> is where the lib subdirectory is in the downloaded the fork and it no longer shows the error.

However, the new fuzzed_y code uses .select instead of .find. Wouldn't find be faster because it would stop searching once it found the first item satisfying the requirements. IOW, something like:

    hash_sort.each do |precise_y|
      matching_y = output.map(&:first).find\
      { |new_y|\
        diff_y = new_y-precise_y;\
        diff_y.abs < y_precision\
      } || precise_y

cppljevans avatar Mar 04 '18 18:03 cppljevans

For anyone still having this issue, I submitted a pull request (#10) to fix compatibility with pdf-reader 2.4.0. You can find it here: https://github.com/emmeryn/pdf-reader-turtletext in the update-gem branch.

emmeryn avatar Dec 05 '19 15:12 emmeryn