LLM in community blocklist
Hi,
I've noticed that some IPs belonging to AI bots are present in the community blocklist, which shouldn’t happen, at least not for operators who publish their official IP ranges.
For example: 20.171.207.135 (OpenAI public IPs: https://openai.com/gptbot-ranges.txt)
These IPs should never be included in the community blocklist by default.
Ideally, users should have the option to treat known LLM/AI bot IPs as a separate category ("AI bots"), and then choose to either:
- whitelist all known AI bot IPs,
- blacklist them,
- or take no action.
Many people actually want their content crawled by these AI bots, hoping to appear in LLM-generated answers. Yes, I know it’s possible to manually whitelist these IPs (see: https://docs.crowdsec.net/docs/whitelist/create_capi/), but by default, these bots should not be blocked by the community lists.
A dedicated collection to either ban or whitelist these bots would be a good solution.
Hello,
We already tag these IPs as belonging to OpenAI, but back when we enabled this there was a lot of press about ai crawlers inadvertently DDoSing sites (especially open source projects) so we chose not to explicitly whitelist these IPs and have them move to community blocklist if they disrespected ratelimits too often.
However, we now re-evaluated this policy and we will shift OpenAI into the whitelist bucket (similar to what is already done with orgs such as Google and Apple who are more respectful of server resources). It will take some time but you can expect a fix to land sometime next week.
Thx nice, yes, give people the choice, you can probably do a ai-bot blacklist scenario too !
Good idea, we'll look into it