jsmaupin
jsmaupin
I decided to try a test outside of Dask and Streamz to see what numbers could be achieved. I inserted 10,000,000 1kb messages into a topic with 20 partitions into...
One other thing I should note is that `.consume(...)` will block if the batch is not full. Therefore, I exit when `last_offset >= high - batch_size`.
It seems like calling `.value()` has a significant impact on performance. I know there's no way around this. We need the value of the message. But, I'm just going to...
My guess is that the message is fetched by the c-extensions, but not handed to the Python runtime until `.value()` is called.
I've written another version of my script above that uses Dask. I'm waiting for a new pre-populating script I wrote to finish running. I can try yours right after.
It seems that nearly the same script I had before, but converted to Dask takes significantly longer. It does not even finish. I am new to Dask, so maybe someone...
Ok, I've changed it so that it is 1 partition per task. I'm getting almost 700 Mb/s. ```python from time import time import confluent_kafka as ck from distributed import Client,...
After changing the above to use 20 workers and only 1 task, I'm seeing rates of over 1GB/s. ```read: 10000000 msgs 10240000000 b in: 9.476426839828491 s rate: 1030.517637613793 mb/s``` I...
@skmatti , is this improved now? Can you post your results?
The Tuesday times work for me.