peterghaddad

Results 32 comments of peterghaddad

Hi @richardliaw, I think the ability to pass metadata similar to [this Issue ](https://github.com/ray-project/ray/pull/25946) would do the trick. Any thoughts on this?

@richardliaw I created a PR that enables the user to pass metadata as an environment variable. I think this is a great method since it is similar to setting Certs...

@DmitriGekhtman and @harryge00 This is a pretty big problem that we are continuing to experience. For example, if we have a Ray remote function @ray.remote(num_gpus=1) that is only called once...

@DmitriGekhtman and @harryge00 wanted to politely follow up. Any thoughts on this functionality for having the autoscaler not try and spawn pods until they are fully terminated, not in a...

I think safe guards are the way to go. @DmitriGekhtman This is with multiple types of jobs and I am submitting through the JobSubmissionSDK. This problem usually occurs when pods...

@DmitriGekhtman We are using Ray for the dynamic scalability features. The autoscaler is beneficial. Our cluster is not static since we are running on Kubernetes. The pods are in a...

``` $ ray status ======== Autoscaler status: 2022-10-11 14:32:56.726898 ======== Node status --------------------------------------------------------------- Healthy: 1 head-group Pending: IP not yet assigned: gpu-group, waiting IP not yet assigned: gpu-group, waiting IP...

@DmitriGekhtman We do have lots of GPUs available. Some are migged however. My configuration is correct. When we actually use Ray, replicas are set to 0 and min is <...

“ Do you mean minReplicas=10, maxReplicas=12?” When we originally deploy, min is < max however after the min value exceeds the max.