A Few Issues found on Kubernetes
I am following k8s part on https://uber.github.io/fiber/getting-started/ and realized the following issues:
-
seems like k8s 'Job' only works in 'default' namespace ? when I try on different namespace, the master pod keep failing and recreate.
-
poolwork pods terminate with 'Failed' status, while master pod returns 'Success'. Any way to address that ?
-
It looks like the k8s 'Job' must have explicit 'name' instead of 'generateName', otherwise master pod throws 'Pod not found' error. Is it known issue ?
Thanks
Hi @ericxu10101 , let me reply those questions inline:
- seems like k8s 'Job' only works in 'default' namespace ? when I try on different namespace, the master pod keep failing and recreate.
That's true. Currently k8s jobs only work in 'default' namespace. It's a limitation of the current version. I'm planning to add that in the next version. Or if you have time, you can submit a PR for that.
- poolwork pods terminate with 'Failed' status, while master pod returns 'Success'. Any way to address that ?
The pods failed most likely due to the daemon thread inside each worker pod which terminates the worker when it lose connection to the master. It' doesn't mean the whole job failed, it only means the worker exited in a special way. And it shouldn't affect the final result (If you notice something otherwise, please create a new issue for it). I'm planning to address this issue in the next version to make sure the exit status of worker pods can be correctly set.
- It looks like the k8s 'Job' must have explicit 'name' instead of 'generateName', otherwise master pod throws 'Pod not found' error. Is it known issue ?
Currently 'name' is required for Fiber on K8s to work. Do you have a use case for 'generateName'? If it's general enough, it can be supported later.
@ericxu10101 issues 1 and 2 have been fixed in #29 and #28. Feel free to install the newest version from the current master branch and test it out. Regarding issue 3, do you have a further explanation of the details of the issue?
@calio thanks a lot. That's awesome ! 3 is not actually an issue, just wanted to confirm the behavior. So seems like giving an explicit name instead of using 'generateName' is the current limitation.
I think supporting 'generateName' probably can be a future feature. It would be handy when scheduling the Job.
Feel free to close this issue. And let me know where and how to track feature requests. Thanks.
Hi @ericxu10101 , currently all features are tracked with Github issues. You can create a new issue with the tag "enhancement" and can track progress over there.
I'm getting urllib3 unable to establish connection in process.py with pi-estimation example. I am unable to resolve it. Tried giving all the permission but nothing seems to work. And also I'm getting this via running fiber cli.
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/help_myjournal/.local/lib/python3.7/site-packages/fiber/cli.py", line 395, in run
job = k8s_backend.create_job(job_spec)
File "/home/help_myjournal/.local/lib/python3.7/site-packages/fiber/kubernetes_backend.py", line 171, in create_job
raise e
File "/home/help_myjournal/.local/lib/python3.7/site-packages/fiber/kubernetes_backend.py", line 168, in create_job
self.default_namespace, body
File "/home/help_myjournal/.local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 7320, in create_namespaced_pod
return self.create_namespaced_pod_with_http_info(namespace, body, **kwargs) # noqa: E501
File "/home/help_myjournal/.local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 7429, in create_namespaced_pod_with_http_info
collection_formats=collection_formats)
File "/home/help_myjournal/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 353, in call_api
_preload_content, _request_timeout, _host)
File "/home/help_myjournal/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 184, in __call_api
_request_timeout=_request_timeout)
File "/home/help_myjournal/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 397, in request
body=body)
File "/home/help_myjournal/.local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 280, in POST
body=body)
File "/home/help_myjournal/.local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 233, in request
raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (501)
Reason: Unsupported method ('POST')
HTTP response headers: HTTPHeaderDict({'Server': 'BaseHTTP/0.3 Python/2.7.13', 'Date': 'Sun, 21 Feb 2021 11:26:45 GMT', 'Connection': 'close', 'Content-Type': 'text/html'})
HTTP response body: <head>
<title>Error response</title>
</head>
<body>
<h1>Error response</h1>
<p>Error code 501.
<p>Message: Unsupported method ('POST').
<p>Error code explanation: 501 = Server does not support this operation.
</body>
Anyone have idea what is happening here? Is it a bug?