fiber icon indicating copy to clipboard operation
fiber copied to clipboard

A Few Issues found on Kubernetes

Open ericxu10101 opened this issue 5 years ago • 5 comments

I am following k8s part on https://uber.github.io/fiber/getting-started/ and realized the following issues:

  1. seems like k8s 'Job' only works in 'default' namespace ? when I try on different namespace, the master pod keep failing and recreate.

  2. poolwork pods terminate with 'Failed' status, while master pod returns 'Success'. Any way to address that ?

  3. It looks like the k8s 'Job' must have explicit 'name' instead of 'generateName', otherwise master pod throws 'Pod not found' error. Is it known issue ?

Thanks

ericxu10101 avatar Aug 08 '20 05:08 ericxu10101

Hi @ericxu10101 , let me reply those questions inline:

  1. seems like k8s 'Job' only works in 'default' namespace ? when I try on different namespace, the master pod keep failing and recreate.

That's true. Currently k8s jobs only work in 'default' namespace. It's a limitation of the current version. I'm planning to add that in the next version. Or if you have time, you can submit a PR for that.

  1. poolwork pods terminate with 'Failed' status, while master pod returns 'Success'. Any way to address that ?

The pods failed most likely due to the daemon thread inside each worker pod which terminates the worker when it lose connection to the master. It' doesn't mean the whole job failed, it only means the worker exited in a special way. And it shouldn't affect the final result (If you notice something otherwise, please create a new issue for it). I'm planning to address this issue in the next version to make sure the exit status of worker pods can be correctly set.

  1. It looks like the k8s 'Job' must have explicit 'name' instead of 'generateName', otherwise master pod throws 'Pod not found' error. Is it known issue ?

Currently 'name' is required for Fiber on K8s to work. Do you have a use case for 'generateName'? If it's general enough, it can be supported later.

calio avatar Aug 11 '20 00:08 calio

@ericxu10101 issues 1 and 2 have been fixed in #29 and #28. Feel free to install the newest version from the current master branch and test it out. Regarding issue 3, do you have a further explanation of the details of the issue?

calio avatar Aug 21 '20 06:08 calio

@calio thanks a lot. That's awesome ! 3 is not actually an issue, just wanted to confirm the behavior. So seems like giving an explicit name instead of using 'generateName' is the current limitation.

I think supporting 'generateName' probably can be a future feature. It would be handy when scheduling the Job.

Feel free to close this issue. And let me know where and how to track feature requests. Thanks.

ericxu10101 avatar Sep 01 '20 22:09 ericxu10101

Hi @ericxu10101 , currently all features are tracked with Github issues. You can create a new issue with the tag "enhancement" and can track progress over there.

calio avatar Sep 03 '20 05:09 calio

I'm getting urllib3 unable to establish connection in process.py with pi-estimation example. I am unable to resolve it. Tried giving all the permission but nothing seems to work. And also I'm getting this via running fiber cli.

  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/help_myjournal/.local/lib/python3.7/site-packages/fiber/cli.py", line 395, in run
    job = k8s_backend.create_job(job_spec)
  File "/home/help_myjournal/.local/lib/python3.7/site-packages/fiber/kubernetes_backend.py", line 171, in create_job
    raise e
  File "/home/help_myjournal/.local/lib/python3.7/site-packages/fiber/kubernetes_backend.py", line 168, in create_job
    self.default_namespace, body
  File "/home/help_myjournal/.local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 7320, in create_namespaced_pod
    return self.create_namespaced_pod_with_http_info(namespace, body, **kwargs)  # noqa: E501
  File "/home/help_myjournal/.local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 7429, in create_namespaced_pod_with_http_info
    collection_formats=collection_formats)
  File "/home/help_myjournal/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 353, in call_api
    _preload_content, _request_timeout, _host)
  File "/home/help_myjournal/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 184, in __call_api
    _request_timeout=_request_timeout)
  File "/home/help_myjournal/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 397, in request
    body=body)
  File "/home/help_myjournal/.local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 280, in POST
    body=body)
  File "/home/help_myjournal/.local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 233, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (501)
Reason: Unsupported method ('POST')
HTTP response headers: HTTPHeaderDict({'Server': 'BaseHTTP/0.3 Python/2.7.13', 'Date': 'Sun, 21 Feb 2021 11:26:45 GMT', 'Connection': 'close', 'Content-Type': 'text/html'})
HTTP response body: <head>
<title>Error response</title>
</head>
<body>
<h1>Error response</h1>
<p>Error code 501.
<p>Message: Unsupported method ('POST').
<p>Error code explanation: 501 = Server does not support this operation.
</body>

Anyone have idea what is happening here? Is it a bug?

kiran-italiya avatar Feb 21 '21 11:02 kiran-italiya