hub icon indicating copy to clipboard operation
hub copied to clipboard

CrowdSec has a huge Matrix problem

Open ethindp opened this issue 3 years ago • 8 comments

Okay, so... Crowdsec has a major problem with Matrix. I just had to put my entire CrowdSec installation into simulation mode globally because it keeps generating a huge number of false positives and effectively isolating my server from the internet. What's the best way of going about fixing this problem? I need to whitelist a few subdomains, then explicitly allow port 8448 (and probably a bunch of others). I could just uninstall all the http-based scenarios, but I really don't want to do that because those are actually useful, just not in their current configuration.

ethindp avatar Sep 20 '22 14:09 ethindp

Hello,

Are you able to share logs generated by matrix ? A list of the scenarios that triggered false positives might be useful too :)

A possibility is to have some matrix-specific whitelists when it is relevant :)

buixor avatar Sep 27 '22 14:09 buixor

link to task #575

LaurenceJJones avatar Oct 27 '22 09:10 LaurenceJJones

Same here from my synapse server with:

  • crowdsecurity/http-probing (frequently)
  • http-crawl-non_statics (rarely)

It is normal to have frequent 404 errors.

cscli alertes inspect -d id:

- Date: 2022-11-21 16:51:07 +0000 UTC
+-----------------+------------------------------------------------------------------------------+
|       KEY       |                                    VALUE                                     |
+-----------------+------------------------------------------------------------------------------+
| ASNNumber       |                                                                         3215 |
| ASNOrg          | Orange                                                                       |
| IsInEU          | true                                                                         |
| IsoCode         | FR                                                                           |
| SourceRange     | 92.184.96.0/19                                                               |
| datasource_path | /var/lib/caddy/log/matrix.domain.tld.log                                     |
| datasource_type | file                                                                         |
| http_path       | /_matrix/media/r0/preview_url?url=https%3A%2F%2Fapt-cacher.net.xxx.fr%3A3142 |
| http_status     |                                                                          404 |
| http_user_agent | Element/1.5.7 (Gigaset GS290;                                                |
|                 | Android 10; e_GS290-user                                                     |
|                 | 10 QQ3A.200805.001                                                           |
|                 | eng.root.20221031.181139                                                     |
|                 | dev-keys,dev-release; Flavour                                                |
|                 | FDroid; MatrixAndroidSdk2                                                    |
|                 | 1.5.7)                                                                       |
| http_verb       | GET                                                                          |
| log_type        | http_access-log                                                              |
| service         | http                                                                         |
| source_ip       | 92.184.xxx.x                                                                 |
| target_fqdn     | matrix.domain.tld                                                            |
| timestamp       | 2022-11-21T16:51:07Z                                                         |
+-----------------+------------------------------------------------------------------------------+

I found several ways to fix this with postoverflows whitelist:

Normal:

# cat /etc/crowdsec/postoverflows/s01-whitelist/whitelist-matrix.yaml (generic) :

name: tetsumaki/matrix
description: "custom matrix whitelist"
whitelist:
  reason: "whitelist false positive for matrix"
  expression:
    - evt.Overflow.Alert.Events[0].GetMeta('http_path') startsWith Lower('/_matrix/')
    - evt.Overflow.Alert.GetScenario() == 'crowdsecurity/http-probing'

# cat /etc/crowdsec/postoverflows/s01-whitelist/whitelist-matrix.yaml (target_fqdn) :

name: tetsumaki/matrix
description: "custom matrix whitelist"
whitelist:
  reason: "whitelist false positive for matrix"
  expression:
    - evt.Overflow.Alert.Events[0].GetMeta('target_fqdn') == 'matrix.domain.tld'
    - evt.Overflow.Alert.GetScenario() == 'crowdsecurity/http-probing'

# cat /etc/crowdsec/postoverflows/s01-whitelist/whitelist-matrix.yaml (datasource_path) :

name: tetsumaki/matrix
description: "custom matrix whitelist"
whitelist:
  reason: "whitelist false positive for matrix"
  expression:
    - evt.Overflow.Alert.Events[0].GetMeta('datasource_path') == '/var/lib/caddy/log/matrix.domain.tld.log'
    - evt.Overflow.Alert.GetScenario() == 'crowdsecurity/http-probing'

Or more aggressive:

# cat /etc/crowdsec/postoverflows/s01-whitelist/whitelist-matrix.yaml (generic) :

name: tetsumaki/matrix
description: "custom matrix whitelist"
whitelist:
  reason: "whitelist false positive for matrix"
  expression:
    - evt.Overflow.Alert.Events[0].GetMeta('http_path') startsWith Lower('/_matrix/')
    - evt.Overflow.Alert.GetScenario() in ['crowdsecurity/http-probing', 'crowdsecurity/http-crawl-non_statics']

# cat /etc/crowdsec/postoverflows/s01-whitelist/whitelist-matrix.yaml (target_fqdn) :

name: tetsumaki/matrix
description: "custom matrix whitelist"
whitelist:
  reason: "whitelist false positive for matrix"
  expression:
    - evt.Overflow.Alert.Events[0].GetMeta('target_fqdn') == 'matrix.domain.tld'
    - evt.Overflow.Alert.GetScenario() in ['crowdsecurity/http-probing', 'crowdsecurity/http-crawl-non_statics']

# cat /etc/crowdsec/postoverflows/s01-whitelist/whitelist-matrix.yaml (datasource_path) :

name: tetsumaki/matrix
description: "custom matrix whitelist"
whitelist:
  reason: "whitelist false positive for matrix"
  expression:
    - evt.Overflow.Alert.Events[0].GetMeta('datasource_path') == '/var/lib/caddy/log/matrix.domain.tld.log'
    - evt.Overflow.Alert.GetScenario() in ['crowdsecurity/http-probing', 'crowdsecurity/http-crawl-non_statics']

tetsumaki avatar Nov 21 '22 18:11 tetsumaki

@buixor Not anymore; I've migrated servers and currently don't have CrowdStrike set up, and now I've completely switched to docker (with Traefik as the front-end server, but even that's containerized), so I have no idea if CrowdStrike would even be able to pick up HTTP-based events from Traefik. I, however, have my suspicions. In particular, I'm like 99 percent sure that this has to do with federation. Matrix is a complex set of specifications, and some of those (e.g. The Server-Server API) require that bots (that is, Matrix servers) contact other servers on port 8448 (but this is by no means the default, either, but let's use that since it's the default) and send federation requests. This allows for fully decentralized communications. On small servers that don't federate with much, this probably only happens 30-50 times a minute (perhaps a bit more, I've no way to know). But on large instances that federate with a lot of communities, like mine, it can happen thousands of times, if not tens of thousands. I have no way of acquiring "times per minute"-ish metrics, however, so I can't give you truly accurate numbers. But the frequency of federation requests (or, really, any server-to-server requests) would, without context, possibly look like a potential HTTP DDoS, since all of this happens transparently and in an automated fassion.

ethindp avatar Feb 23 '23 15:02 ethindp

Okay, so I have CrowdSec installed (I used CrowdStrike in my previous comment, sorry about that). I'm not really sure how to get the specific requests that cause CrowdSec to ban Matrix servers, but one possibility is a whitelist that ignores crowdsecurity/http-probing and http-crawl-non_statics for certain domains only. The playbook I'm using has recently migrated to using Traefik, though Nginx is still used for some things. The eventual goal is to migrate entirely over to Traefik. Is this possible at the moment? I currently am using cscli simulation enable --global to ensure that everything is just simulated and no actual action is taking (the bouncer is also not running) so I can figure out how to prevent CrowdSec from freaking out about Matrix and risking banning everyone who tries connecting to my server.

ethindp avatar Mar 10 '23 20:03 ethindp

@tetsumaki solution works great, thank you

VPaulV avatar Jun 23 '23 09:06 VPaulV

After an upgrade to 1.5.3 @tetsumaki 's solution stopped working somehow.

helkaluin avatar Sep 20 '23 03:09 helkaluin

After an upgrade to 1.5.3 @tetsumaki 's solution stopped working somehow.

We posted in the discord there a new release coming out 20 Sept that has a fix for postoverflow whitelist as 1.5.3 had a bug.

LaurenceJJones avatar Sep 20 '23 04:09 LaurenceJJones