feat: Support debugging webhooks locally
What this PR does / why we need it: Add a method for quick local development and debugging, enabling developers to rapidly develop and debug the operator on their local machines, such as a Mac.
Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #2226
Checklist:
- [x] Docs included if any changes are user facing
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign gaocegege for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Even if the certificates are configured locally, the kube-apiserver may still not be able to access the local webhook server (without using kind).
Should we add an env ENABLE_WEBHOOKS=false to disable webhooks for easier local debugging?
@shalousun @tenzen-y @andreyvelich What do you think?
Even if the certificates are configured locally, the kube-apiserver may still not be able to access the local webhook server (without using kind).
Should we add an env
ENABLE_WEBHOOKS=falseto disable webhooks for easier local debugging?@shalousun @tenzen-y @andreyvelich What do you think?
I agree with this approach. In my local environment, I actually disabled webhooks by adding the ENABLE_WEBHOOKS flag. This is a modification I made myself, which you can see in this commit: https://github.com/shalousun/training-operator/commit/45d9ebb8b42a429fa6de3ff5272de02d3dba415b.
Even if the certificates are configured locally, the kube-apiserver may still not be able to access the local webhook server (without using kind).
Should we add an env
ENABLE_WEBHOOKS=falseto disable webhooks for easier local debugging?@shalousun @tenzen-y @andreyvelich What do you think?
Instead of the special environment variable, you can use the "Ignore" failurePolicy like here: https://github.com/kubeflow/training-operator/blob/3f9b0a41663456998e352212dfbfd1386b48d7ab/manifests/base/webhook/manifests.yaml#L14
Even if the certificates are configured locally, the kube-apiserver may still not be able to access the local webhook server (without using kind). Should we add an env
ENABLE_WEBHOOKS=falseto disable webhooks for easier local debugging? @shalousun @tenzen-y @andreyvelich What do you think?Instead of the special environment variable, you can use the "Ignore" failurePolicy like here:
https://github.com/kubeflow/training-operator/blob/3f9b0a41663456998e352212dfbfd1386b48d7ab/manifests/base/webhook/manifests.yaml#L14
First, I agree with using the failurePolicy setting, which is an elegant way and does not cause destructive changes to webhooks already existing in the cluster (for example, when starting an operator on a local PC to connect to a remote test cluster for debugging). However, for the scenario of rapid local debugging, we also need to skip the webhook startup checks, otherwise, there will be certificate verification issues. Therefore, providing the ENABLE_WEBHOOKS configuration should be necessary. Secondly, for modifying the failurePolicy, I believe it should be handled through a kubebuilder patch that is automatically applied, and this patch should be kept separately in the overlays/local directory.
@shalousun Do you have a startup check if you set failurePolicy: Ignore ?
Pull Request Test Coverage Report for Build 10432779786
Warning: This coverage report may be inaccurate.
This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
- For more information on this, see Tracking coverage changes with pull request builds.
- To avoid this issue with future PRs, see these Recommended CI Configurations.
- For a quick fix, rebase this PR at GitHub. Your next report should be accurate.
Details
- 0 of 0 changed or added relevant lines in 0 files are covered.
- 6 unchanged lines in 1 file lost coverage.
- Overall coverage decreased (-0.03%) to 33.435%
| Files with Coverage Reduction | New Missed Lines | % |
|---|---|---|
| pkg/controller.v1/mpi/mpijob_controller.go | 6 | 80.67% |
| <!-- | Total: | 6 |
| Totals | |
|---|---|
| Change from base Build 10408646954: | -0.03% |
| Covered Lines: | 3945 |
| Relevant Lines: | 11799 |
💛 - Coveralls
@shalousun Do you have a startup check if you set
failurePolicy: Ignore?
ENABLE_WEBHOOKS=false
I have tested by setting failurePolicy: Ignore which allows the webhook to not enforce failure checks, enabling normal testing of the operator in local development tools. However, the webhook configuration is auto-generated, and the generated policy is embedded in the Go code used to write the webhook, such as
// +kubebuilder:webhook:path=/validate-kubeflow-org-v1-pytorchjob,mutating=false,failurePolicy=fail,sideEffects=None,groups=kubeflow.org,resources=pytorchjobs,verbs=create;update,versions=v1,name=validator.pytorchjob.training-operator.kubeflow.org,admissionReviewVersions=v1
var _ webhook.CustomValidator = &Webhook{}
Therefore, providing a specific template for local debugging under the overlays directory is a better approach as it does not require any manual modifications. Finally, adding ENABLE_WEBHOOKS=false is also necessary as it allows local development and debugging without having to worry about the complex certificate generation process.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.