About using memory
Hello When I set the quantity of each batch to 2W, Container memory usage approx. 8g. Here are some configurations
input:
kafka: xxxx
output:
minio: xxxx
buffer:
memory:
limit: 125829120
batch_policy:
enabled: true
byte_size: 29360128
count: 20000
period: 30s
processors:
- archive:
format: lines
But when I adjust the batch number to 5000, the memory will reach 16g. Here are the related configurations
input:
kafka: xxxx
output:
minio: xxxx
buffer:
memory:
limit: 125829120
batch_policy:
enabled: true
byte_size: 29360128
count: 5000
period: 30s
processors:
- archive:
format: lines
I would like to ask what may have caused this. Also, why is there such a big difference between the memory used by the buffer limit and the real memory used.
Hey @skyoct, while the batch you construct is in-flight and being processed the memory buffer is free to begin filling up again, and this repeats for any number of parallel threads or in-flight output capacity. If you're processing a backlog of kafka messages then you need to account for several in-flight instances of the batcher.
I generally wouldn't aim to create batches that are large enough to be significant in proportion to the memory of the running machine, the nature of streaming systems is to keep data moving in order to avoid capacity issues like this.