[Bug] Autoscaler fails if the raycluster is installed by helm with autoscaler v2 enabled and additionalWorkerGroups
Search before asking
- [x] I searched the issues and found no similar issues.
KubeRay Component
Others
What happened + What you expected to happen
It would be Error in autoscaler if the raycluster is installed by helm with autoscaler v2 and additionalWorkerGroups are specified.
The error log of autoscaler is as follow:
The resources is missing in the additionalWorkerGroups(the second one in "Worker Group Specs") but the first one's resources is existed even through both of them are not specified in yaml.
If the resources are specified in the additionalWorkerGroups, the raycluster could run correctly.
Reproduction script
Run the following command with the yaml file below:
helm install raycluster kuberay/ray-cluster -f raycluster_helm.yaml
raycluster_helm.yaml:
image:
repository: rayproject/ray
tag: 2.46.0
head:
rayStartParams:
num-cpus: "0"
enableInTreeAutoscaling: true
autoscalerOptions:
version: v2
upscalingMode: Default
idleTimeoutSeconds: 600 # 10 minutes
resources:
limits:
cpu: 1
memory: 4G
requests:
cpu: 1
memory: 4G
worker:
groupName: standard-worker
replicas: 0
minReplicas: 0
maxReplicas: 5
additionalWorkerGroups:
additional-worker-group1:
image:
repository: rayproject/ray
tag: 2.46.0
pullPolicy: IfNotPresent
disabled: false
replicas: 0
minReplicas: 0
maxReplicas: 5
# resources:
# limits:
# cpu: "1"
# memory: "1G"
# requests:
# cpu: "1"
# memory: "1G"
The raycluster could run correctly if the resources at the bottom is uncommented.
Anything else
No response
Are you willing to submit a PR?
- [x] Yes I am willing to submit a PR!
we should specify resource in the manifest
Hi may I take this?
Hi @400Ping, are you still working on this? I’d be happy to take it over if you’re okay with that.
Ok, go ahead.
Thanks! I'll go on and take this one.