[BUG]: The New York Times is blocked
Description of the bug
Here's what I get when attempting to view an nytimes.com article:
You have been blocked from The New York Times because we suspect that you're a robot.
Why am I seeing this? There are a few possibilities:
You are browsing and clicking much faster than is typical of a human being
Something is preventing Javascript from working on your computer
There is a bot with the same IP address as you
That said, love that this tool exists, it's exactly what I was looking to add to my workflow to save things to Wallabag and prevent them from getting cropped.
Steps To Reproduce
Using the latest via docker-compose.ymal
Attempt any article from nytimes.com, gets blocked.
I've tried with a publicly exposed version as well as private network.
Additional Information
No response
Getting this with WSJ too. Seems like they both use DataDome.
Could get a different result if you route through a VPN/Tor?
I have done tests running it under Gluetun to use my VPN but still get the same message as you guys.
What we're seeing here is the next round of the paywall coldwar. If you understand how 13ft works, you'll know that it intentionally impersonates a search crawler to be passed through the paywall. NYT and others have changed how they evaluate crawlers and can now detect 13ft et al.
We won't see a "fix" for this until contributors refine the proxied request to again doge the new host algorithm(s). I'd have a crack at a PR, but my Python is for shit.
@evansharp do you have a sense of what needs to be changed at a high level?