StreamBench
StreamBench copied to clipboard
Flink window and Spark window
Flink need do group operation first before window. Spark doesn't need. What's more, spark should avoid groupByKey. https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html
Avoid to use group in the how bench system, try to use reduceByKey, windowReduceByKey
Window on a non-grouped stream, spark has windows on each node. Flink has only one global window in one single node.