SNNNNX

Results 4 issues of SNNNNX

How to get throughout and latency information? Throughout could be get at the latest step(operation), simply log how many data received in a period of time.

Flink streaming word count loses data window().groupBy(0).reduceWindow().flatten().groupBy(0).reduce()

Flink need do group operation first before window. Spark doesn't need. What's more, spark should avoid groupByKey. https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html