[Bug] KubeRay Operator pod fails to start when using --enable-metrics with helm chart v1.3.2
Search before asking
- [x] I searched the issues and found no similar issues.
KubeRay Component
ray-operator
What happened + What you expected to happen
The KubeRay Operator deployment fails when including --enable-metrics in the argument list.
This is derived from following lines in the deployment.yaml file:
https://github.com/ray-project/kuberay/blob/bc2e2c6bb0363ae17a32e4f3a3afb0dd2555c573/helm-chart/kuberay-operator/templates/deployment.yaml#L108-L110
Reproduction script
Example arguments:
- args:
- >-
--feature-gates=RayClusterStatusConditions=true,RayJobDeletionPolicy=false
- '--enable-leader-election=true'
- '--enable-metrics=true'
Pod fails to start:
flag provided but not defined: -enable-metrics
Usage of /manager:
-batch-scheduler string
Batch scheduler name, supported values are volcano and yunikorn.
-config string
Path to structured config file. Flags are ignored if config file is set.
-enable-batch-scheduler
(Deprecated) Enable batch scheduler. Currently is volcano, which supports gang scheduler policy. Please use --batch-scheduler instead.
-enable-leader-election
Enable leader election for controller manager. Enabling this will ensure there is only one active controller manager. (default true)
-feature-gates string
A set of key=value pairs that describe feature gates. E.g. FeatureOne=true,FeatureTwo=false,...
-forced-cluster-upgrade
(Deprecated) Forced cluster upgrade flag
-health-probe-bind-address string
The address the probe endpoint binds to. (default ":8082")
-kubeconfig string
Paths to a kubeconfig. Only required if out-of-cluster.
-leader-election-namespace string
Namespace where the leader election resource lives. Defaults to the pod namespace if not set.
-log-file-encoder string
Encoder to use for log file. Valid values are 'json' and 'console'. Defaults to 'json' (default "json")
-log-file-path string
Synchronize logs to local file
-log-stdout-encoder string
Encoder to use for logging stdout. Valid values are 'json' and 'console'. Defaults to 'json' (default "json")
-metrics-addr string
The address the metric endpoint binds to. (default ":8080")
-reconcile-concurrency int
max concurrency for reconciling (default 1)
-use-kubernetes-proxy
Use Kubernetes proxy subresource when connecting to the Ray Head node.
-watch-namespace string
Specify a list of namespaces to watch for custom resources, separated by commas. If left empty, all namespaces will be watched.
-zap-devel
Development Mode defaults(encoder=consoleEncoder,logLevel=Debug,stackTraceLevel=Warn). Production Mode defaults(encoder=jsonEncoder,logLevel=Info,stackTraceLevel=Error)
-zap-encoder value
Zap log encoding (one of 'json' or 'console')
-zap-log-level value
Zap Level to configure the verbosity of logging. Can be one of 'debug', 'info', 'error', or any integer value > 0 which corresponds to custom debug levels of increasing verbosity
-zap-stacktrace-level value
Zap Level at and above which stacktraces are captured (one of 'info', 'error', 'panic').
-zap-time-encoding value
Zap time encoding (one of 'epoch', 'millis', 'nano', 'iso8601', 'rfc3339' or 'rfc3339nano'). Defaults to 'epoch'.
Anything else
No response
Are you willing to submit a PR?
- [ ] Yes I am willing to submit a PR!
thank you for reporting. Let me take a look
It' weird that the flag support was not checked in to 1.3.2. In this PR, I added both the helm chart and the operator flag support. However, when I checked the 1.3.2 tag, only helm chart update is there but not the operator side. @kevin85421 do you see any potential issue with the 1.3.2 release?
Sorry I was wrong. It looks like helm chart v1.3.2 also do not have this flag. Can you confirm which version of the helm chart you were using? Thank you!
Sorry I was wrong. It looks like helm chart v1.3.2 also do not have this flag. Can you confirm which version of the helm chart you were using? Thank you!
I'm using v1.3.2
I think you need to follow the steps in https://github.com/ray-project/kuberay/blob/master/ray-operator/DEVELOPMENT.md#run-the-operator-inside-the-cluster to run latest version Operator and helm chart, because the Helm chart v1.3.2 does not support the --enable-metrics flag.
this flag support in v1.4.0 or later.
I think you need to follow the steps in https://github.com/ray-project/kuberay/blob/master/ray-operator/DEVELOPMENT.md#run-the-operator-inside-the-cluster to run latest version Operator and helm chart, because the Helm chart v1.3.2 does not support the
--enable-metricsflag.this flag support in v1.4.0 or later.
But the helm chart should work instead of failing to start.
Let me try it out with v1.3.2
Hi @cmontemuino I just installed kuberay operator with helm chart v1.3.2 and things work fine. Below is the command that I tested.
kind create cluster --image=kindest/node:v1.26.0
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
helm install kuberay-operator kuberay/kuberay-operator --version 1.3.2
and there is no --enable-metrics in the deploy container args
spec:
containers:
- args:
- --feature-gates=RayClusterStatusConditions=true,RayJobDeletionPolicy=false
- --enable-leader-election=true
Do you mind sharing your steps to reproduce the issue? Thank you!