newspaper icon indicating copy to clipboard operation
newspaper copied to clipboard

Error converting html to string.

Open tspier opened this issue 4 years ago • 6 comments

I'm not sure if it's an issue with the HTML of the website, if there's an issue parsing Tajiki, or something else, but I tried scraping http://www.jumhuriyat.tj/index.php?art_id=44635 on the Heroku demo page and received the following notice: Error converting html to string.

tspier avatar Jul 18 '21 23:07 tspier

I'm getting the same errors on multiple sites

passionetartufo avatar Jul 22 '21 00:07 passionetartufo

even i am also facing the same issue, is this repository running?

itskrsna avatar Jul 27 '21 10:07 itskrsna

The article http://www.jumhuriyat.tj/index.php?art_id=44635 cannot be scraped with Newspaper3k. The reason is related to the structure of the HTML, which doesn't provide a clear block of article text to extract.

johnbumgarner avatar Aug 05 '21 17:08 johnbumgarner

I'm getting the same errors on multiple sites

@giggioman00 What sites are giving you issues?

johnbumgarner avatar Oct 05 '21 16:10 johnbumgarner

even i am also facing the same issue, is this repository running?

@blueshirtdeveloper What sites are giving you issues?

johnbumgarner avatar Oct 05 '21 16:10 johnbumgarner

Not sure how dead this thread is, but I'm getting the error on all articles from the following domains (whole article path for easy access):

  • https://www.seattletimes.com/business/sticker-shock-heres-how-to-find-a-cheaper-flight-this-summer/
  • https://www.washingtonpost.com/opinions/2022/06/27/giuliani-backslap-supermarket/

joshhoegen avatar Jun 28 '22 16:06 joshhoegen