jsmaupin comments

Results 35 comments of


                                            jsmaupin

Issues scaling kafka consumers with Dask

I decided to try a test outside of Dask and Streamz to see what numbers could be achieved. I inserted 10,000,000 1kb messages into a topic with 20 partitions into...

Issues scaling kafka consumers with Dask

One other thing I should note is that `.consume(...)` will block if the batch is not full. Therefore, I exit when `last_offset >= high - batch_size`.

Issues scaling kafka consumers with Dask

It seems like calling `.value()` has a significant impact on performance. I know there's no way around this. We need the value of the message. But, I'm just going to...

Issues scaling kafka consumers with Dask

My guess is that the message is fetched by the c-extensions, but not handed to the Python runtime until `.value()` is called.

Issues scaling kafka consumers with Dask

I've written another version of my script above that uses Dask. I'm waiting for a new pre-populating script I wrote to finish running. I can try yours right after.

Issues scaling kafka consumers with Dask

It seems that nearly the same script I had before, but converted to Dask takes significantly longer. It does not even finish. I am new to Dask, so maybe someone...

Issues scaling kafka consumers with Dask

Ok, I've changed it so that it is 1 partition per task. I'm getting almost 700 Mb/s. ```python from time import time import confluent_kafka as ck from distributed import Client,...

Issues scaling kafka consumers with Dask

After changing the above to use 20 workers and only 1 task, I'm seeing rates of over 1GB/s. ```read: 10000000 msgs 10240000000 b in: 9.476426839828491 s rate: 1030.517637613793 mb/s``` I...

Issues scaling kafka consumers with Dask

@skmatti , is this improved now? Can you post your results?

Streamz Benchmarking

The Tuesday times work for me.