python icon indicating copy to clipboard operation
python copied to clipboard

Recent changes to v1_node_condition.py causing ValueError for MCEHardwareErrors on Oracle OKE cloud platform

Open koaps opened this issue 3 years ago • 9 comments

Changes for 23.3.0: https://github.com/kubernetes-client/python/commit/b227345fb27fdee65b3218e0d7af18748d78469d

Includes a hard coded set of conditions which may not include all supported conditions for different cloud providers:

https://github.com/kubernetes-client/python/blob/master/kubernetes/client/models/v1_node_condition.py#L217

On Oracle's OKE cloud platform, it includes an additional condition: MCEHardwareErrors

Which is causing a ValueError when the 23.3.0 client is used to list nodes:

  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/models/v1_node_condition.py", line 221, in type
    .format(type, allowed_values)
ValueError: Invalid value for `type` (MCEHardwareErrors), must be one of ['DiskPressure', 'MemoryPressure', 'NetworkUnavailable', 'PIDPressure',

Node conditions should be more dynamic based on the cloud provider used.

koaps avatar Mar 01 '22 23:03 koaps

Same for GCP/GKE:

.../lib/python3.9/site-packages/kubernetes/client/models/v1_pod_readiness_gate.py", line 78, in condition_type
    raise ValueError(
ValueError: Invalid value for `condition_type` (cloud.google.com/load-balancer-neg-ready), must be one of ['ContainersReady', 'Initialized', 'PodScheduled', 'Ready']

This is about pod conditions or pod readiness gates, but it looks to me like is has the same root cause as the node condition issue of @koaps

Edit: it happened first after upgrading from 21.7.0 to 23.3.0

julianseeger avatar Mar 02 '22 11:03 julianseeger

I have the same issue :

File "/usr/lib/python3.6/site-packages/kubernetes/client/models/v1_node_condition.py", line 221, in type

[2022-03-02T09:48:00.771Z] E                 .format(type, allowed_values)

[2022-03-02T09:48:00.771Z] E             ValueError: Invalid value for `type` (FrequentDockerRestart), must be one of ['DiskPressure', 'MemoryPressure', 'NetworkUnavailable', 'PIDPressure', 'Ready']

chrislinan avatar Mar 02 '22 13:03 chrislinan

Piling on here .. Getting similar error in AWS world in combination with AWS load balancer controller which injects readiness gate to pods . Below is the error I see :

File "../lib/python3.9/site-packages/kubernetes/client/models/v1_pod_readiness_gate.py", line 52, in __init__
   self.condition_type = condition_type
 File "../lib/python3.9/site-packages/kubernetes/client/models/v1_pod_readiness_gate.py", line 78, in condition_type
   raise ValueError(
ValueError: Invalid value for `condition_type` (target-health.elbv2.k8s.aws/k8s-<name>-b77a905f9d), must be one of ['ContainersReady', 'Initialized', 'PodScheduled', 'Ready']

kpulgam avatar Mar 03 '22 00:03 kpulgam

Same for Azure/AKS

ValueError: Invalid value for type (ContainerRuntimeProblem), must be one of ['DiskPressure', 'MemoryPressure', 'NetworkUnavailable', 'PIDPressure', 'Ready']

sybnex avatar Mar 18 '22 10:03 sybnex

This is being fixed in upstream. We will cut a new 1.23 client to backport the fix once the PR https://github.com/kubernetes/kubernetes/pull/108740 is merged

roycaihw avatar Mar 28 '22 16:03 roycaihw

Is there any mitigation available while a permanent fix is being developed yet?

iamkarlson avatar Apr 07 '22 08:04 iamkarlson

Is there any mitigation available while a permanent fix is being developed yet?

We faced the issue too. We would like to know a workaround until a new version will be released.

everpeace avatar Apr 11 '22 03:04 everpeace

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 10 '22 03:07 k8s-triage-robot

/remove-lifecycle stale

iamkarlson avatar Jul 14 '22 09:07 iamkarlson

@roycaihw has this been backported?

stan-sz avatar Sep 27 '22 09:09 stan-sz

I see the change that was breaking it was reverted in https://github.com/kubernetes-client/python/pull/1789

SergeyKanzhelev avatar Nov 10 '22 21:11 SergeyKanzhelev

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 08 '23 22:02 k8s-triage-robot

/remove-lifecycle stale

iamkarlson avatar Feb 23 '23 21:02 iamkarlson

/remove-lifecycle stale

Do you still see this issue? I didn't check the last release myself recently, but last time I checked, code was fixed.

SergeyKanzhelev avatar Feb 23 '23 21:02 SergeyKanzhelev

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 24 '23 21:05 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jun 23 '23 22:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Jan 19 '24 01:01 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 19 '24 01:01 k8s-ci-robot