Python requests banned?
Describe the bug
I am building a Python app and part of it interfaces with the Adoptium API. Using the Requests library, which sends user-agent python-requests/2.31.0 by default, gives a 403 forbidden error. Not sending a user-agent at all also causes a 403 forbidden. Setting the user-agent to something else fixes the issue, but it shouldn't be a necessary workaround, any user-agent should be allowed.
To Reproduce
-
pip install requests - Write a file like this:
import requests
resp = requests.get("https://api.adoptium.net/v3/info/available_releases")
print(resp.request.headers)
print(resp.headers)
print(resp.status_code)
print(resp.text)
- Run it. Observe 403 error (e.g. first screenshot)
- Change second line of file to be like this:
resp = requests.get("https://api.adoptium.net/v3/info/available_releases", headers={"User-Agent": "Dummy"})
- Run it again. Observe it works. Wow!
Expected behavior It works with the default requests config
Screenshots
Doesn't work with the regular header.
Works when you put some random one. Crazy stuff, one http header is the entire difference between failure and success.
Device (please complete the following information):
- OS: macOS Ventura 13.3.1
- Browser: N/A
- Version: N/A
Additional context Looks to be an azure misconfiguration. Also this happened before but it wasn't fixed properly.
@johnoliver any ideas here? I can confirm that I'm also seeing the same error when using the above test file
I wonder if its related to the hosting provider, see a similar SO question about cloudflare:
https://stackoverflow.com/questions/74446830/how-to-fix-403-forbidden-errors-with-python-requests-even-with-user-agent-head
This sounds like some protection mechanism against data scraping.
I tested the above snippet and can confirm that I also receive a 403 error.
However, by simply adding a User-Agent header like that:
import requests
headers = {
"User-Agent": "My User Agent 1.0",
}
resp = requests.get(
"https://api.adoptium.net/v3/info/available_releases", headers=headers
)
print(resp.request.headers)
print(resp.headers)
print(resp.status_code)
print(resp.text)
the request passes:
{'User-Agent': 'My User Agent 1.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
{'Date': 'Sat, 21 Oct 2023 20:02:31 GMT', 'Content-Type': 'application/json;charset=UTF-8', 'Content-Length': '344', 'Connection': 'keep-alive', 'Strict-Transport-Security': 'max-age=63072000; includeSubDomains; preload', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'DENY', 'X-Pod-Hostname': 'frontend-service-89db8c5b7-tvvxp'}
200
{
"available_lts_releases": [
8,
11,
17,
21
],
...