kuberay icon indicating copy to clipboard operation
kuberay copied to clipboard

[Bug] Autoscaler fails if the raycluster is installed by helm with autoscaler v2 enabled and additionalWorkerGroups

Open fscnick opened this issue 5 months ago • 2 comments

Search before asking

  • [x] I searched the issues and found no similar issues.

KubeRay Component

Others

What happened + What you expected to happen

It would be Error in autoscaler if the raycluster is installed by helm with autoscaler v2 and additionalWorkerGroups are specified. Image

The error log of autoscaler is as follow: Image

The resources is missing in the additionalWorkerGroups(the second one in "Worker Group Specs") but the first one's resources is existed even through both of them are not specified in yaml.

Image

If the resources are specified in the additionalWorkerGroups, the raycluster could run correctly.

Reproduction script

Run the following command with the yaml file below:

helm install raycluster kuberay/ray-cluster -f raycluster_helm.yaml

raycluster_helm.yaml:

image:
  repository: rayproject/ray
  tag: 2.46.0
head:
  rayStartParams:
    num-cpus: "0"
  enableInTreeAutoscaling: true
  autoscalerOptions:
    version: v2
    upscalingMode: Default
    idleTimeoutSeconds: 600 # 10 minutes
  resources:
    limits:
      cpu: 1
      memory: 4G
    requests:
      cpu: 1
      memory: 4G
worker:
  groupName: standard-worker
  replicas: 0
  minReplicas: 0
  maxReplicas: 5
additionalWorkerGroups:
  additional-worker-group1:
    image:
      repository: rayproject/ray
      tag: 2.46.0
      pullPolicy: IfNotPresent
    disabled: false
    replicas: 0
    minReplicas: 0
    maxReplicas: 5
#    resources:
#      limits:
#        cpu: "1"
#        memory: "1G"
#      requests:
#        cpu: "1"
#        memory: "1G"

The raycluster could run correctly if the resources at the bottom is uncommented.

Anything else

No response

Are you willing to submit a PR?

  • [x] Yes I am willing to submit a PR!

fscnick avatar Nov 19 '25 14:11 fscnick

we should specify resource in the manifest

Future-Outlier avatar Nov 22 '25 01:11 Future-Outlier

Hi may I take this?

400Ping avatar Nov 28 '25 11:11 400Ping

Hi @400Ping, are you still working on this? I’d be happy to take it over if you’re okay with that.

justinyeh1995 avatar Dec 23 '25 07:12 justinyeh1995

Ok, go ahead.

400Ping avatar Dec 23 '25 07:12 400Ping

Thanks! I'll go on and take this one.

justinyeh1995 avatar Dec 23 '25 08:12 justinyeh1995