Throughout and latency
How to get throughout and latency information? Throughout could be get at the latest step(operation), simply log how many data received in a period of time.
Throughput should be logged at each operator step
Spark streaming latency is a little special to test. Operator reduceByKey is not a pipeline operator. It would stack until current batch is processed.
If there are some data in Kafka topic, the first time Spark streaming deal with kafka topic, it will read all the existing data.
Spark throughput and latency could be find in self-implemented UI component of Spark. https://databricks.com/blog/2015/07/08/new-visualizations-for-understanding-spark-streaming-applications.html
Next step is parallel data generator and performance logging statistic