WIP: Assigning nodes to knative services
Description
Adds ability to assign knative services to nodes
Changes
- adds
nodeSelector,nodeAffinityandtolerationto podspec - adds update functions in
podspec_helper.gofor each of those fields - only supports ORed terms for required clause of node affinity
- supports previously added nodeselectors, but no removals in node affinity and toleration since there is no clear identifier.
Reference
Fixes https://github.com/knative/client/issues/1841
Release Note
@dsimansk hey, going to add some more tests, and maybe refactor few things, early feedback would be appreciated!
Codecov Report
Attention: Patch coverage is 75.53957% with 34 lines in your changes are missing coverage. Please review.
Project coverage is 76.82%. Comparing base (
cbb6f5c) to head (57eeed4). Report is 9 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #1924 +/- ##
==========================================
+ Coverage 74.58% 76.82% +2.23%
==========================================
Files 207 207
Lines 15567 12892 -2675
==========================================
- Hits 11611 9904 -1707
+ Misses 3167 2187 -980
- Partials 789 801 +12
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@Shashankft9 thanks for looking into this feature. I'm still not 100% convinced we should have it though.
Given that every option is hidden behind its own specific flag. There's no good way to determine if the flags are actually usable on the Serving instances, until executed against webhook that might reject it.
I wonder how the error message looks like from Serving's webhook. If it propagates a good hint for users why Ksvc creation failed.
More over looking at the "verbosity" of required input. I doubt the overall usefulness. Subjectively I'd opt for KSVC stored in yaml format for such advanced configuration.
I mean this kind of verbosity:
--node-affinity Type="Preferred",Key="topology.kubernetes.io/zone",Operator="In",Values="antarctica-east1",Weight="1"
/cc @rhuss any thoughts?
@dsimansk thanks for the feedback, I'll work on the pointers. But regarding the usefulness, we currently have one use case that needs something like this in kn cli, I will try to justify below:
So in some of edge cloud deployments and dev kubernetes clusters, we are using knative functions with tekton (on-cluster builds), but in addition to its current cli form, we have also used the function client and built a controller on top of it, that provides a CRD like UX for function creation. Currently on-cluster build has three tekton tasks - git clone, build and deploy, but this poses a problem when someone has to add things like scaling configurations, tls and other serving specific configurations. To do this with func cli, its quite easy, but when doing it through on-cluster builds, it implies that the user has to do the changes in func.yaml and then commit it in git and then run on-cluster builds, which goes quite against the UX that we are wanting to provide through CRD.
So, how do we solve this?
We added another tekton task after deploy which we called as kn-patch, so now we accept scaling configurations in our CRD, and then those configurations we pass in this task, which essentially applies all required changes without the user ever doing any changes in git or even knowing that the function exists somewhere in git.
But then, we came across one scenario in our edge cloud deployments where we had certain nodes in a kubernetes cluster dedicated to specific workloads and had a different mtu packet size set on them which had an affect on the usual service/pod networking for those nodes, essentially meaning that our functions must not schedule on those nodes, so we have to make functions stick to some nodes, hence the need for these mechanisms through kn.
Does that make sense merely from the use-case perspective? I am not sure how others are using kn cli, but this is how we are currently using it. Maybe this is something that can be useful for the func's CRD and Operator story?
I wonder how the error message looks like from Serving's webhook. If it propagates a good hint for users why Ksvc creation failed.
For this, I can check and post here, but I think last when I tried, Serving's webhook gave a clear hint around it
here's what it looks like when i disable node affinity, node selector and toleration from feature flags:
root@faas-cluster-xnts4:~# kn service create nodeaffinitytest --image knativesamples/helloworld --node-affinity Type="Required",Key="topology.kubernetes.io/zone",Operator="In",Values="antarctica-east1 antarctica-east2"
Error: admission webhook "validation.webhook.serving.knative.dev" denied the request: validation failed: must not set the field(s): spec.template.spec.affinity
Run 'kn --help' for usage
root@faas-cluster-xnts4:~# kn service create tolerationtest --image knativesamples/helloworld --toleration Key="node-role.kubernetes.io/master",Effect="NoSchedule",Operator="Equal",Value=""
Error: admission webhook "validation.webhook.serving.knative.dev" denied the request: validation failed: must not set the field(s): spec.template.spec.tolerations
Run 'kn --help' for usage
root@faas-cluster-xnts4:~# kn service create nodeselectortest --image knativesamples/helloworld --node-selector Disktype="ssd"
Error: admission webhook "validation.webhook.serving.knative.dev" denied the request: validation failed: must not set the field(s): spec.template.spec.nodeSelector
Run 'kn --help' for usage
@dsimansk thanks for the feedback, I'll work on the pointers. But regarding the usefulness, we currently have one use case that needs something like this in kn cli, I will try to justify below:
So in some of edge cloud deployments and dev kubernetes clusters, we are using knative functions with tekton (on-cluster builds), but in addition to its current cli form, we have also used the function client and built a controller on top of it, that provides a CRD like UX for function creation. Currently on-cluster build has three tekton tasks - git clone, build and deploy, but this poses a problem when someone has to add things like scaling configurations, tls and other serving specific configurations. To do this with func cli, its quite easy, but when doing it through on-cluster builds, it implies that the user has to do the changes in func.yaml and then commit it in git and then run on-cluster builds, which goes quite against the UX that we are wanting to provide through CRD. So, how do we solve this? We added another tekton task after deploy which we called as
kn-patch, so now we accept scaling configurations in our CRD, and then those configurations we pass in this task, which essentially applies all required changes without the user ever doing any changes in git or even knowing that the function exists somewhere in git.But then, we came across one scenario in our edge cloud deployments where we had certain nodes in a kubernetes cluster dedicated to specific workloads and had a different mtu packet size set on them which had an affect on the usual service/pod networking for those nodes, essentially meaning that our functions must not schedule on those nodes, so we have to make functions stick to some nodes, hence the need for these mechanisms through kn.
Does that make sense merely from the use-case perspective? I am not sure how others are using kn cli, but this is how we are currently using it. Maybe this is something that can be useful for the func's CRD and Operator story?
I wonder how the error message looks like from Serving's webhook. If it propagates a good hint for users why Ksvc creation failed.
For this, I can check and post here, but I think last when I tried, Serving's webhook gave a clear hint around it
Thanks, for the extensive reply to support the usefulness concern. I'm getting a better picture now.
One idea how to address my concern might be introducing "experimental" or "advanced" section in the help message. I.e. to de-clutter current list of kn service flags and split them into sections. To clearly indicate that this flags require additional configuration like adding feature flag to Serving.
I'll take a look how to achieve it in spf13/cobra, I recall there are a few options how to create a subsections. And of course descriptive sub-section names (naming game is always hardests part :)).
hello @dsimansk , this is ready for another review - I have reverted the changes done for creating subsections in flags, and added tests.
context: subsection in flag work was reverted because of the pending feature in cobra, more here - https://cloud-native.slack.com/archives/C04LY4SKBQR/p1713175380354749
@Shashankft9 could you pls try to rerun ./hack/build.sh -c that should execute gofmt and fix formatting issues.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: dsimansk, Shashankft9
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [dsimansk]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
@dsimansk ack, I can work on those improvements in followup PR - thanks!