connect
connect copied to clipboard
Whether the output batch supports multi threading
hello I read the code related to batcher and found that it seems to be single thread processing. Is there any way to make this conversion parquet file faster. https://github.com/benthosdev/benthos/blob/811c58786a46085861a828f7fd606e659f872253/internal/component/output/batcher/batcher.go#L63-L156
batching:
byte_size: 125829120
count: 20000
period: 30s
processors:
- parquet:
compression: snappy
operator: from_json
schema: ''
Hey @skyoct, you could move the batching mechanism up to the input level, and then perform the processing within pipeline.processors where you can have parallel processing threads, something like this:
input:
foo:
batching:
byte_size: 125829120
count: 20000
period: 30s
pipeline:
processors:
- parquet:
compression: snappy
operator: from_json
schema: ''
output:
bar: {}
If the specific input you're using doesn't have a batching field then place it within a broker:
input:
broker:
inputs:
- foo: {}
batching:
byte_size: 125829120
count: 20000
period: 30s