Yi Cheng comments

Results 77 comments of


                                            Yi Cheng

[core][scalability] Change ray syncer from unary call to streaming call

> if we have fault tolerance semantics documented somewhere. @rkooo567 I actually put some workflow [here](https://github.com/ray-project/ray/blob/972caacc365d7bf17f7e3916cbbc41b7196b0654/src/ray/common/ray_syncer/ray_syncer-inl.h#L91-L117) It at least give the developer an overview of how the state changed.

[core][scalability] Change ray syncer from unary call to streaming call

gcs ft test fixed.

[core][scalability] Change ray syncer from unary call to streaming call

all test passed when flag is on https://buildkite.com/ray-project/oss-ci-build-pr/builds/10260 ray syncer test failed in asan. i'm going to take another look, but i feel it's close.

[core] Support Redis with replicas

@edoakes I think it depends on the server side. As long as the master in the cluster is alive, it should work. So let's say: 1. Ray is running and...

[core] Support Redis with replicas

> > 1. Ray is running and the master failed => GCS won't be back. (This can be improved in the future with more work). > > 2. Start a...

[core] Support Redis with replicas

@edoakes I think your point about db becomes single point failure makes sense. The challenging part is that if the master is down, we need to redirect all the requests...

[core] Support Redis with replicas

@edoakes this is correct. I think one step further is to make it work with the case when master is down. We have two ways: 1. improve the current code...

[core] Support Redis with replicas

@edoakes [here](https://github.com/ray-project/ray/blob/master/src/ray/gcs/store_client/store_client.h#L59) you can see the API return Status and the callback doesn't take care of the failure or success. redis++ needs the callback take care of the error code....

[core] Support Redis with replicas

come back to this work after the oncall works. Interesting, scan and hscan actually return different type of values...

[core] Support Redis with replicas

seems only cpp testing failure related. almost there.