ruby-readability
ruby-readability copied to clipboard
Port of arc90's readability project to Ruby
``` Failures: 1) Readability images should show one image, but outside of the best candidate Failure/Error: @input = @input.gsub(REGEXES[:replaceBrsRe], '').gsub(REGEXES[:replaceFontsRe], '') ArgumentError: invalid byte sequence in UTF-8 # ./lib/readability.rb:48:in `gsub'...
When you don't put 'div' as a tag in the initializer like: require 'rubygems' require 'readability' require 'open-uri' source = open('https://developers.google.com/custom-search/docs/tutorial/creatingcse').read puts Readability::Document.new(source, tags: []).content it trows the error: NullPointerException:...
First, thanks for your work on readability :-) Just a quick feedback (I'm not a heavy user myself): while upgrading an old setup today, I noticed that a raw content...
I wanted to display the content of the article as it is. That is, with image. Can this be implemented?
eg., ``` ... ... ``` will only return `/v9/images/1x1-white.jpg` when `images()` is called. However, if `images_with_fqdn_uris!("http://bla.com")` is called then subsequent calls to `images()` will return an array with all image...
Hello, I tried to apply readability on a specific layout of The Guardian, which heavily relies on JavaScript but still has most of the text available in the HTML source...
setting :get_largest_image => true will return only 1 largest image from .images() check for image size from style attribute, if available, like: style="width:400px; height:300px"
Is it possible to get the HTML of the main content area? I would like to preserve the tags present in the main content area (whitelisted tags, mainly the divs,...