shobhit_n

Results 13 comments of shobhit_n

@tariq1890 Please find the manifest of GPU node when all nvidia pods are in running state: ``` k get po -n gpu-operator -o wide |grep -i ip-10-222-100-91.ec2.internal gpu-feature-discovery-zzkqg 1/1 Running...

@tariq1890 Like direct termination of backend ec2 instance and it was removing all these nvidia pods till k8s 1.24, but on k8s 1.26 version these 4 pods shows running even...

@tariq1890 @cdesiniotis @shivamerla please let me know how to fix this issue. Daemonsets are not getting scaled down on node termination by cluster autoscaler. This would ideally removed all nvidia...

@shivamerla @tariq1890 @cdesiniotis Could you please help us in fixing this behaviour , due to this unnecessarly showing pods in namepace which actually not exist as node already got scaled...

@shivamerla Yes we are using private registry , please find the controller-manager logs for errors: ``` I1110 02:29:52.043817 1 gc_controller.go:329] "PodGC is force deleting Pod" pod="gpu-operator/gpu-feature-discovery-krj5j" E1110 02:29:52.048104 1 gc_controller.go:255]...

@cartermckinnon We have followed below steps on existing 1.26 cluster to make it ready for 1.27 upgrade ``` On existing version 1.26 Add tag to each node [kubernetes.io/cluster/cluster-name: owned k...

@cartermckinnon Let me share you 10-kubeadm-conf and kubeadm-config which we currently have in 1.26 where in tree support is there :- ``` 10-kubeam-conf # Note: This dropin only works with...