python-readability icon indicating copy to clipboard operation
python-readability copied to clipboard

fast python port of arc90's readability tool, updated to match latest readability.js!

Results 43 python-readability issues
Sort by recently updated
recently updated
newest added

e.g. at https://edition.cnn.com/2020/07/24/politics/donald-trump-coronavirus-briefing-jacksonville/index.html A lot of the text content is unfortunately not present in the .summary() result.

When trying to use xpath=True in summary to extract the main content, you get the wrong result for several webpages, otherwise its correct. The reason is that the length check...

I noticed that sequences like ` A B C ` It is transformed into ` A B C ` Instead of something like: ` A B C ` This causes...

Hi! Sorry for probably asking a simple thing, but nevertheless. Is it possible to use this readbility library to simplify webpage with pictures? Like, let's say, most of medium articles....

In some websites (phoronix for example) some tags (notably `a` and `em`) are wrapped in their own unnecessary paragraph. This causes unnecessary line breaks, ultimately hurting the page's readability. Here's...

As of now only strings containing HTML seem to be acceptable input. Is there a way to pass an object parsed by LXML or `lxml.html` (types: `etree._ElementTree` and `html.HtmlElement`) straight...

it seems like the embedded instagram and twitter content are filtered. How can I keep them? (example: https://www.businessinsider.com/elon-musk-tweets-bernie-sanders-meme-2020-2)

SVGs often render way too big on most websites (see e.g. [github](https://github.com/baskerville/plato/blob/e6d071a3258f2ef9eb38881ff5641b8782c3c30f/doc/LIBRARY.md) and the [mozilla docs](https://developer.mozilla.org/en-US/docs/Learn/HTML/Multimedia_and_embedding/Adding_vector_graphics_to_the_Web), see also the screenshot below), providing quite the distraction. Moreover, they are generally non-informative...

We are processing the text from https://www.fiolinjurylaw.com/ using readability and a much of the content is missing. I've attached the readability output as generated by: $ python -m readability.readability -u...

In the score_paragraphs method content score is calculated like this: **content_score += len(inner_text.split(','))** But I think it should be like below, because there may be no comma in a text....