az aks command invoke: does not work if user nodes have taints
Describe the bug
Command Name
az aks command invoke -n $AKS_NAME -c "kubectl cluster-info"
Errors:
(KubernetesOperationError) Failed to run command due to cluster perf issue, container command-0be71db980254f398cdecce07419fbed in aks-command namespace did not start within 30s on your cluster, retry may helps. If issue persist, you may need to tune your cluster with better performance (larger node/paid tier).
Code: KubernetesOperationError
Message: Failed to run command due to cluster perf issue, container command-0be71db980254f398cdecce07419fbed in aks-command namespace did not start within 30s on your cluster, retry may helps. If issue persist, you may need to tune your cluster with better performance (larger node/paid tier).
Event Message:
0/3 nodes are available: 1 node(s) had untolerated taint {agentpool: user}, 2 node(s) had untolerated taint {CriticalAddonsOnly: true}. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
To Reproduce:
Steps to reproduce the behavior. Note that argument values have been redacted, as they may contain sensitive information.
- create a user nodepool with a taint
"agentpool=user:NoSchedule" - try to execute command:
-
az aks command invoke -n NAME -c "kubectl cluster-info"
Expected Behavior
aks command invoke should be able to start on system nodes with the default taint: CriticalAddonsOnly=true
Environment Summary
Linux-5.15.79.1-microsoft-standard-WSL2-x86_64-with, Alpine Linux v3.17
Python 3.10.9
Installer: PIP
azure-cli 2.44.1
Extensions:
account 0.2.5
Dependencies:
msal 1.20.0
azure-mgmt-resource 21.1.0b1
Additional Context
route to CXP team
@jetnet The underlying REST API for this command schedules a pod without any tolerations by default. Ideally, it would be best not to deploy non-critical workloads on a system node as it is possible that such workloads could starve resources from critical resources.
That being said, it would be best to create a feature request to add support for adding tolerations to unblock similar situations.
Since the Azure CLI itself doesn't have control over this, there is nothing that can be done in this context and should eventually get support when the underlying REST API supports it.
Hi @jetnet. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text “/unresolve” to remove the “issue-addressed” label and continue the conversation.
@PramodValavala-MSFT, really appreciate your clarification. Should I create a feature request or are you going to do that? Thanks!
Hi @jetnet, since you haven’t asked that we “/unresolve” the issue, we’ll close this out. If you believe further discussion is needed, please add a comment “/unresolve” to reopen the issue.
/unresolve
I think, it's an issue with the current implementation and NOT a feature request. Look, you cannot run az command invoke if your AKS user nodes have a taint. It's not OK.
Please re-open. Thanks!
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/aks-pm.
Issue Details
Describe the bug
Command Name
az aks command invoke -n $AKS_NAME -c "kubectl cluster-info"
Errors:
(KubernetesOperationError) Failed to run command due to cluster perf issue, container command-0be71db980254f398cdecce07419fbed in aks-command namespace did not start within 30s on your cluster, retry may helps. If issue persist, you may need to tune your cluster with better performance (larger node/paid tier).
Code: KubernetesOperationError
Message: Failed to run command due to cluster perf issue, container command-0be71db980254f398cdecce07419fbed in aks-command namespace did not start within 30s on your cluster, retry may helps. If issue persist, you may need to tune your cluster with better performance (larger node/paid tier).
Event Message:
0/3 nodes are available: 1 node(s) had untolerated taint {agentpool: user}, 2 node(s) had untolerated taint {CriticalAddonsOnly: true}. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
To Reproduce:
Steps to reproduce the behavior. Note that argument values have been redacted, as they may contain sensitive information.
- create a user nodepool with a taint
"agentpool=user:NoSchedule" - try to execute command:
-
az aks command invoke -n NAME -c "kubectl cluster-info"
Expected Behavior
aks command invoke should be able to start on system nodes with the default taint: CriticalAddonsOnly=true
Environment Summary
Linux-5.15.79.1-microsoft-standard-WSL2-x86_64-with, Alpine Linux v3.17
Python 3.10.9
Installer: PIP
azure-cli 2.44.1
Extensions:
account 0.2.5
Dependencies:
msal 1.20.0
azure-mgmt-resource 21.1.0b1
Additional Context
| Author: | jetnet |
|---|---|
| Assignees: | - |
| Labels: |
|
| Milestone: | - |
@jetnet Apologies for the delay on this one! Since this requires a Service side change to support, I will be reassigning this case to the concerned team and sharing the feedback with them internally.
Is there a workaround for this issue ?
Any updates ?
Any updates on this ?
@PramodValavala-MSFT any updates on this ?