Gyewon Lee

Results 20 issues of Gyewon Lee

This PR resolves #1131 via * Implement recovery-based scaling out. * Add necessary avro formats and function calls.

Current implementation of scale in/out is based on fault recovery, which can cause long latency. We need to make it more efficient in the later version.

In current PR, #1114, we don't support task recovery when a master is not recovered yet. We need to fix this issue after merging #1114.

We need to measure the recovery time to identify recovery speed of each policy.

experiment

Currently code (PR #1011) only considers CPU utilization when allocating queries. However, there are cases when we should consider other metrics (Memory, Network, ...) when allocating queries. To deal with...

scale-out

To leverage efficient resource utilization, we need to implement automatic scale-in & out according to the load change.

scale-out

We need to update master information after reallocation, if there is any change the master should know about.

query reallocation
scale-out

To reduce recovery time, it is necessary to distribute the queries in a one group to multiple machines to reduce dynamic query loading as well as maximize node parallelism in...

scale-out

Currently, it does not handle query deletion. As a result, MQTT clients still exist even after there are no sinks / sources using them. We need to consider making MQTT...

Based on stats from executors(engines), Master needs to select executor(Engine) pairs to be rebalanced and send rebalancing messages to these executors(Engines).

query reallocation
scale-out