503 on /{club_id}/players - rate limit ?
Production API consistently returns 503 errors for /clubs/{club_id}/players?season_id={season} Strangely, after running the request locally, the same request then works in prod for 1-2 mins.
Reproduction
-
Production fails
curl "https://transfermarkt-api.fly.dev/clubs/631/players?season_id=2022"503: "Service Unavailable for url: https://www.transfermarkt.com/-/kader/verein/631/saison_id/2022/plus/1" -
Local works fine
curl "http://localhost:8001/clubs/631/players?season_id=2022"✅ 200 -
Production now magically works
curl "https://transfermarkt-api.fly.dev/clubs/631/players?season_id=2022"✅ 200 -
Production fails after a couple of minutes again.
(I have my own version hosted on fly and it's getting lots of random 503's on different endpoints btw)
I'm encountering the same issue—have you managed to find a reliable fix? At the moment, my workaround in production is to wait for two minutes after a failure before retrying. Unfortunately, this approach is painfully slow and makes completing all my tasks take forever.
Hi everyone,
It seems this is an intentional measure on Transfermarkt’s side to discourage web scraping (as noted in other projects, e.g., this issue).
I’m not sure there’s much we can do about it at the moment, but if you have any ideas or suggestions, please feel free to share!
tm is implementing more restrictive policies to reduce scraping, so after a certain number of requests it blocks for a while with a 503
Do you guys have a rough idea how many request are allowed before you get a 503? I want to start using transfermarkt data but it would be useless if I cannot scrape more than a few per minute.
I do not think that transfermarkt access restrictions can / should be solved inside this code repository, for the deployed version on fly.dev it may be viable to use a scraping platform or web proxy provider. what are your thoughts on this @felipeall ?
for example scrapingbot offers a free tier ( scrapingbot pricing ) with a monthly contingent of tokens. addtionally a caching mechanism for the retrieved data is necessary to not burn through the free tokens immediately. this also implies some tradeoffs
- of data freshness, since data can/should be only crawled so often
- probably as well in number of provided API endpoints
there may be other, possibly even better platforms that offer a free tier. it may also be possible to contact sales and "ask for a free account or extra credits for an open-source project"
do you think this is a way forward?