[core] Support encrypted redis connection.
Why are these changes needed?
This PR support encrypted redis connection. A new env is introduced RAY_REDIS_ENABLE_SSL and if it's set to be true it'll use ssl with the redis.
To support better ssl, redis is also upgraded. It's not supposed to impact ray because it's the user's responsibility to provide a redis.
Right now all flags are provided by the os envs. It's not ideal and we should design a better format when we refactor the db layer.
Besides, right now gRPC is using OS envs for the encrypted connection in ray, just follow this pattern for now.
Related issue number
Closes https://github.com/ray-project/ray/issues/27371
Checks
- [ ] I've signed off every commit(by using the -s flag, i.e.,
git commit -s) in this PR. - [ ] I've run
scripts/format.shto lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [ ] Unit tests
- [ ] Release tests
- [ ] This PR is not tested :(
hmmm, it seems the building of redis has some issues. I'm looking into this.
sentinel.c:34:10: fatal error: openssl/ssl.h: No such file or directory
34 | #include "openssl/ssl.h"
| ^~~~~~~~~~~~~~~
I need to change the bazel rule to build redis.
@scv119 ah,, I forget to include the unit test. I actually have one but forgot to add.
@simon-mo the services change is only for testing. Ray will not start redis for the user anymore. Ideally, the Redis change should be moved from the service.py to test.py. So:
use SSL with the redis bundled/started by Ray (I assume this is supported by the services.py:_start_redis_instance change. Do users need to supply all the CERTs parameters?
It's the user's responsibility to bring a redis. How to setup the cert it depends on the Redis doc.
Which env var does the user use to connect to hosted redis (e.g. aws memorystore https://docs.aws.amazon.com/memorydb/latest/devguide/getting-startedclusters.connecttonode.html)
I'm not quite sure about AWS's use case. I think here it's more related to the Redis client itself. Everything is based on Redis protocol.
didn't mean to approve, this PR needs documentation change as well
@simon-mo right now, all the GCS Fault Tolerance features are only to support ray serve and the only place where ray serves can be HA will be in KubeRay Ray Service Operator. The users can't simply use these features by themselves.
I think we should just add the flags into the kuberay's doc. So it shouldn't be in this one. I'll add it once it's merged.
@simon-mo updated to rediss://
test failure not related.