connect icon indicating copy to clipboard operation
connect copied to clipboard

About using memory

Open skyoct opened this issue 3 years ago • 1 comments

Hello When I set the quantity of each batch to 2W, Container memory usage approx. 8g. Here are some configurations

input:
  kafka: xxxx

output:
  minio: xxxx

buffer:
  memory:
    limit: 125829120
    batch_policy:
      enabled: true
      byte_size: 29360128
      count: 20000
      period: 30s
      processors:
        - archive:
            format: lines

But when I adjust the batch number to 5000, the memory will reach 16g. Here are the related configurations

input:
  kafka: xxxx

output:
  minio: xxxx

buffer:
  memory:
    limit: 125829120
    batch_policy:
      enabled: true
      byte_size: 29360128
      count: 5000
      period: 30s
      processors:
        - archive:
            format: lines

I would like to ask what may have caused this. Also, why is there such a big difference between the memory used by the buffer limit and the real memory used.

skyoct avatar Jun 13 '22 01:06 skyoct

Hey @skyoct, while the batch you construct is in-flight and being processed the memory buffer is free to begin filling up again, and this repeats for any number of parallel threads or in-flight output capacity. If you're processing a backlog of kafka messages then you need to account for several in-flight instances of the batcher.

I generally wouldn't aim to create batches that are large enough to be significant in proportion to the memory of the running machine, the nature of streaming systems is to keep data moving in order to avoid capacity issues like this.

Jeffail avatar Jun 14 '22 17:06 Jeffail