crowdsec icon indicating copy to clipboard operation
crowdsec copied to clipboard

decision stream api return huge value each 2h (crowdec 1.4.3)

Open aderumier opened this issue 2 years ago • 9 comments

What happened?

Hi, I'm currenlty using haproxy bouncer in streaming mode.

I don't known why, but each 2h exactly, the steam api result a really big result, around 2MB (maybe the full decision list ?), and 1core of haproxy is going to 100% for 1~2min.

(and as I'm using a central api, all my haproxy (around 100~200 haproxy instrances) are doing the same exactly at the same time, using a lot of cpu on my clusters.

at normal time, I'm geting only some bytes are result mon api.

number of decisions seem be around 17000

# cscli decisions list --all |wc -l : 17200

haproxy log with call to local api:

big result

-:- [03/Sep/2023:06:57:05.954] <HTTPCLIENT> <HTTPCLIENT>/<HTTPCLIENT> 2/0/0/418/425 200 **2360631** - - ---- 1/0/0/0/0 0/0 {} "GET http://x.x.x.x:8080/v1/decisions/stream?startup=false HTTP/1.1" normal result

-:- [03/Sep/2023:07:57:42.524] <HTTPCLIENT> <HTTPCLIENT>/<HTTPCLIENT> 2/0/0/18/18 200 458 - - ---- 1/0/0/0/0 0/0 {} "GET http://x.x.x.x:8080/v1/decisions/stream?startup=false HTTP/1.1"
-:- [03/Sep/2023:07:57:52.548] <HTTPCLIENT> <HTTPCLIENT>/<HTTPCLIENT> 2/0/0/18/18 200 319 - - ---- 0/0/0/0/0 0/0 {} "GET http://x.x.x.x:8080/v1/decisions/stream?startup=false HTTP/1.1"
-:- [03/Sep/2023:08:26:57.052] <HTTPCLIENT> <HTTPCLIENT>/<HTTPCLIENT> 2/0/0/21/21 200 655 - - ---- 0/0/0/0/0 0/0 {} "GET http://x.x.x.x:8080/v1/decisions/stream?startup=false HTTP/1.1"

big result 2h later

-:- [03/Sep/2023:08:57:12.210] <HTTPCLIENT> <HTTPCLIENT>/<HTTPCLIENT> 2/0/0/484/495 200 **2373385** - - ---- 1/0/0/0/0 0/0 {} "GET http://x.x.x.x:8080/v1/decisions/stream?startup=false HTTP/1.1"

What did you expect to happen?

always incremental update of decisions

How can we reproduce it (as minimally and precisely as possible)?

I can provide dump of my conf && local api db if needed.

Anything else we need to know?

No response

Crowdsec version

 cscli version
2023/09/03 12:17:40 version: v1.4.3-debian-pragmatic-f2528f3e2966d257905cca47fa1fa0e67cc2e2e8
2023/09/03 12:17:40 Codename: alphaga
2023/09/03 12:17:40 BuildDate: 2022-11-30_13:48:34
2023/09/03 12:17:40 GoVersion: 1.19.2
2023/09/03 12:17:40 Platform: linux
2023/09/03 12:17:40 Constraint_parser: >= 1.0, <= 2.0
2023/09/03 12:17:40 Constraint_scenario: >= 1.0, < 3.0
2023/09/03 12:17:40 Constraint_api: v1
2023/09/03 12:17:40 Constraint_acquis: >= 1.0, < 2.0

OS version

cat /etc/os-release PRETTY_NAME="Debian GNU/Linux 11 (bullseye)" NAME="Debian GNU/Linux" VERSION_ID="11" VERSION="11 (bullseye)" VERSION_CODENAME=bullseye ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/"

uname -a Linux crowdsec.odiso.net 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux


</details>


### Enabled collections and parsers

<details>

```console
$ cscli hub list -o raw
# paste output here

Acquisition config

```console # On Linux: $ cat /etc/crowdsec/acquis.yaml /etc/crowdsec/acquis.d/* # paste output here

On Windows:

C:> Get-Content C:\ProgramData\CrowdSec\config\acquis.yaml

paste output here

Config show

$ cscli config show
# paste output here

Prometheus metrics

$ cscli metrics
# paste output here

Related custom configs versions (if applicable) : notification plugins, custom scenarios, parsers etc.

aderumier avatar Sep 03 '23 10:09 aderumier

@aderumier: Thanks for opening an issue, it is currently awaiting triage.

In the meantime, you can:

  1. Check Crowdsec Documentation to see if your issue can be self resolved.
  2. You can also join our Discord.
  3. Check Releases to make sure your agent is on the latest version.
Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

github-actions[bot] avatar Sep 03 '23 10:09 github-actions[bot]

cscli decisions list --all : decision.txt.gz

aderumier avatar Sep 03 '23 10:09 aderumier

haproxy bouncer config:

# haproxy
# path to community_blocklist.map
MAP_PATH=/var/lib/crowdsec/lua/haproxy/community_blocklist.map
# bounce for all type of remediation that the bouncer can receive from the local API
BOUNCING_ON_TYPE=ban
FALLBACK_REMEDIATION=ban
REQUEST_TIMEOUT=3000
UPDATE_FREQUENCY=10
# live or stream
MODE=stream
# exclude the bouncing on those location
EXCLUDE_LOCATION=
#those apply for "ban" action
# /!\ REDIRECT_LOCATION and RET_CODE can't be used together. REDIRECT_LOCATION take priority over RET_CODE
# path to ban template
BAN_TEMPLATE_PATH=
REDIRECT_LOCATION=
RET_CODE=
#those apply for "captcha" action
# ReCaptcha Secret Key
SECRET_KEY=
# Recaptcha Site key
SITE_KEY=
# path to captcha template
CAPTCHA_TEMPLATE_PATH=/var/lib/crowdsec/lua/haproxy/templates/captcha.html
CAPTCHA_EXPIRATION=3600

aderumier avatar Sep 03 '23 10:09 aderumier

I'm not sure, but could it be community-blocklist update related ? (I'm not sure if it's sent incrementally ?). Maybe crowdsec 1.5 could help, I think they are some new feature about ? (I'm planning to upgrade soon)

aderumier avatar Sep 03 '23 10:09 aderumier

Yes, upgrading to 1.5 and enabling chunked stream will help you @aderumier see

https://github.com/crowdsecurity/cs-haproxy-bouncer/issues/28#issuecomment-1649391294

LaurenceJJones avatar Sep 03 '23 11:09 LaurenceJJones

ok thanks for your fast response!

I'll try to upgrade next week, I'll make a report here

aderumier avatar Sep 03 '23 12:09 aderumier

Hi, I just upgrade to 1.5.

I don't seem to work

chuked decision steam is enable

#cscli config feature-flags
 --- Enabled features ---

✓ chunked_decisions_stream: Enable chunked decisions stream

but when the community-block is updated

time="04-09-2023 14:08:25" level=info msg="crowdsecurity/community-blocklist : added 15000 entries, deleted 14680 entries (alert:9068725)"

All the haproxy bouncers still receive a big update of 2mb at the same time

-:- [04/Sep/2023:14:08:34.299] <HTTPCLIENT> <HTTPCLIENT>/<HTTPCLIENT> 2/0/0/635/762 200 1937404 - - ---- 846/0/0/0/0 0/0 {} "GET http://10.3.95.136:8080/v1/decisions/stream?startup=false HTTP/1.1"

This is almost the same size than the full decisions list retrieve at startup

-:- [04/Sep/2023:12:22:29.299] <HTTPCLIENT> <HTTPCLIENT>/<HTTPCLIENT> 2/0/0/217/321 200 2616483 - - ---- 0/0/0/0/0 0/0 {} "GET http://10.3.95.136:8080/v1/decisions/stream?startup=true HTTP/1.1"

aderumier avatar Sep 04 '23 12:09 aderumier

Yeah it wont change the overall size it will allow the stream to be more performant for CrowdSec and the bouncer. So if you are collecting metrics it will show this side effect.

Note the decisions list when the blocklist pull will cause a big update as all decisions are classed as new even if the IP is the same as previous pull this is because it costs more to iterate over the list then just inserting it as is.

LaurenceJJones avatar Sep 04 '23 13:09 LaurenceJJones

mmm,ok ..I was thinking that big lists from central api will be sent slowly by chunks to the bouncer.

My problem is more that the local api is shared with multiple bouncers, and when the blocklist is update, all the bouncers are loading the big list exactly at the same time.

It's not a problem for the crowdsec server, but all the bouncers are vms, and they be located on same hypervisor., that mean than hypervisor cpu is jumping to 40cores for 30s, if I have 40 bouncers on this host. (as it take around 1core 100% for 30s to update the ""big"" list of 15000 ips ). I don't even think about enabling other lists for now.

I don't see any improvement about cpu on bouncer side with chunk enabled.

as workaround, I'll try to look if I can lower the cpu usage with a higher msleep in the lua bouncer. maybe also increase the update frequency to 1min instead 10s, to have more chance that not all bouncer are updating at same time.

In my usecase, if could have some kind of random update time for the blocklist at bouncer side, it could work too.

question: is it possible to change the 2h interval of blocklist update ? (once a day is enough for me) . Or maybe schedule it through a cron ?

aderumier avatar Sep 04 '23 15:09 aderumier