StreamBench icon indicating copy to clipboard operation
StreamBench copied to clipboard

Throughout and latency

Open 0x7aF777 opened this issue 10 years ago • 4 comments

How to get throughout and latency information? Throughout could be get at the latest step(operation), simply log how many data received in a period of time.

0x7aF777 avatar Dec 05 '15 22:12 0x7aF777

Throughput should be logged at each operator step

0x7aF777 avatar Dec 07 '15 10:12 0x7aF777

Spark streaming latency is a little special to test. Operator reduceByKey is not a pipeline operator. It would stack until current batch is processed.

If there are some data in Kafka topic, the first time Spark streaming deal with kafka topic, it will read all the existing data.

0x7aF777 avatar Dec 08 '15 16:12 0x7aF777

Spark throughput and latency could be find in self-implemented UI component of Spark. https://databricks.com/blog/2015/07/08/new-visualizations-for-understanding-spark-streaming-applications.html

0x7aF777 avatar Dec 11 '15 21:12 0x7aF777

Next step is parallel data generator and performance logging statistic

0x7aF777 avatar Dec 12 '15 23:12 0x7aF777