python-goose icon indicating copy to clipboard operation
python-goose copied to clipboard

NY Times doesn't work

Open abhigenie92 opened this issue 10 years ago • 1 comments

from goose import Goose extractor = Goose() article = extractor.extract(url='http://www.nytimes.com/2015/05/19/health/study-finds-dense-breast-tissue-isnt-always-a-high-cancer-risk.html?src=me&ref=general') text = article.cleaned_text

abhigenie92 avatar May 20 '15 21:05 abhigenie92

NYT does a ton of redirecting, it's incredibly annoying. The strategy is to set the user agent to look like a browser and then continue from there (learned from a colleague at Factr). If it doesn't like the user agent, it will sometimes put you in an infinite redirect loop. It partially has to do with their paywall.

MaxwellRebo avatar Jun 22 '15 15:06 MaxwellRebo