client icon indicating copy to clipboard operation
client copied to clipboard

WIP: Assigning nodes to knative services

Open Shashankft9 opened this issue 1 year ago • 8 comments

Description

Adds ability to assign knative services to nodes

Changes

  • adds nodeSelector, nodeAffinity and toleration to podspec
  • adds update functions in podspec_helper.go for each of those fields
  • only supports ORed terms for required clause of node affinity
  • supports previously added nodeselectors, but no removals in node affinity and toleration since there is no clear identifier.

Reference

Fixes https://github.com/knative/client/issues/1841

Release Note


Shashankft9 avatar Mar 19 '24 12:03 Shashankft9

@dsimansk hey, going to add some more tests, and maybe refactor few things, early feedback would be appreciated!

Shashankft9 avatar Mar 19 '24 12:03 Shashankft9

Codecov Report

Attention: Patch coverage is 75.53957% with 34 lines in your changes are missing coverage. Please review.

Project coverage is 76.82%. Comparing base (cbb6f5c) to head (57eeed4). Report is 9 commits behind head on main.

Files Patch % Lines
pkg/kn/flags/podspec_helper.go 77.27% 16 Missing and 9 partials :warning:
pkg/kn/flags/podspec.go 76.00% 3 Missing and 3 partials :warning:
pkg/util/parsing_helper.go 25.00% 3 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1924      +/-   ##
==========================================
+ Coverage   74.58%   76.82%   +2.23%     
==========================================
  Files         207      207              
  Lines       15567    12892    -2675     
==========================================
- Hits        11611     9904    -1707     
+ Misses       3167     2187     -980     
- Partials      789      801      +12     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 19 '24 12:03 codecov[bot]

@Shashankft9 thanks for looking into this feature. I'm still not 100% convinced we should have it though.

Given that every option is hidden behind its own specific flag. There's no good way to determine if the flags are actually usable on the Serving instances, until executed against webhook that might reject it.

I wonder how the error message looks like from Serving's webhook. If it propagates a good hint for users why Ksvc creation failed.

More over looking at the "verbosity" of required input. I doubt the overall usefulness. Subjectively I'd opt for KSVC stored in yaml format for such advanced configuration.

I mean this kind of verbosity:

--node-affinity Type="Preferred",Key="topology.kubernetes.io/zone",Operator="In",Values="antarctica-east1",Weight="1"

/cc @rhuss any thoughts?

dsimansk avatar Mar 19 '24 17:03 dsimansk

@dsimansk thanks for the feedback, I'll work on the pointers. But regarding the usefulness, we currently have one use case that needs something like this in kn cli, I will try to justify below:

So in some of edge cloud deployments and dev kubernetes clusters, we are using knative functions with tekton (on-cluster builds), but in addition to its current cli form, we have also used the function client and built a controller on top of it, that provides a CRD like UX for function creation. Currently on-cluster build has three tekton tasks - git clone, build and deploy, but this poses a problem when someone has to add things like scaling configurations, tls and other serving specific configurations. To do this with func cli, its quite easy, but when doing it through on-cluster builds, it implies that the user has to do the changes in func.yaml and then commit it in git and then run on-cluster builds, which goes quite against the UX that we are wanting to provide through CRD. So, how do we solve this? We added another tekton task after deploy which we called as kn-patch, so now we accept scaling configurations in our CRD, and then those configurations we pass in this task, which essentially applies all required changes without the user ever doing any changes in git or even knowing that the function exists somewhere in git.

But then, we came across one scenario in our edge cloud deployments where we had certain nodes in a kubernetes cluster dedicated to specific workloads and had a different mtu packet size set on them which had an affect on the usual service/pod networking for those nodes, essentially meaning that our functions must not schedule on those nodes, so we have to make functions stick to some nodes, hence the need for these mechanisms through kn.

Does that make sense merely from the use-case perspective? I am not sure how others are using kn cli, but this is how we are currently using it. Maybe this is something that can be useful for the func's CRD and Operator story?

I wonder how the error message looks like from Serving's webhook. If it propagates a good hint for users why Ksvc creation failed.

For this, I can check and post here, but I think last when I tried, Serving's webhook gave a clear hint around it

Shashankft9 avatar Mar 19 '24 18:03 Shashankft9

here's what it looks like when i disable node affinity, node selector and toleration from feature flags:

root@faas-cluster-xnts4:~# kn service create nodeaffinitytest --image knativesamples/helloworld --node-affinity Type="Required",Key="topology.kubernetes.io/zone",Operator="In",Values="antarctica-east1 antarctica-east2"
Error: admission webhook "validation.webhook.serving.knative.dev" denied the request: validation failed: must not set the field(s): spec.template.spec.affinity
Run 'kn --help' for usage
root@faas-cluster-xnts4:~# kn service create tolerationtest --image knativesamples/helloworld --toleration Key="node-role.kubernetes.io/master",Effect="NoSchedule",Operator="Equal",Value=""
Error: admission webhook "validation.webhook.serving.knative.dev" denied the request: validation failed: must not set the field(s): spec.template.spec.tolerations
Run 'kn --help' for usage
root@faas-cluster-xnts4:~# kn service create nodeselectortest --image knativesamples/helloworld --node-selector Disktype="ssd"
Error: admission webhook "validation.webhook.serving.knative.dev" denied the request: validation failed: must not set the field(s): spec.template.spec.nodeSelector
Run 'kn --help' for usage

Shashankft9 avatar Mar 19 '24 18:03 Shashankft9

@dsimansk thanks for the feedback, I'll work on the pointers. But regarding the usefulness, we currently have one use case that needs something like this in kn cli, I will try to justify below:

So in some of edge cloud deployments and dev kubernetes clusters, we are using knative functions with tekton (on-cluster builds), but in addition to its current cli form, we have also used the function client and built a controller on top of it, that provides a CRD like UX for function creation. Currently on-cluster build has three tekton tasks - git clone, build and deploy, but this poses a problem when someone has to add things like scaling configurations, tls and other serving specific configurations. To do this with func cli, its quite easy, but when doing it through on-cluster builds, it implies that the user has to do the changes in func.yaml and then commit it in git and then run on-cluster builds, which goes quite against the UX that we are wanting to provide through CRD. So, how do we solve this? We added another tekton task after deploy which we called as kn-patch, so now we accept scaling configurations in our CRD, and then those configurations we pass in this task, which essentially applies all required changes without the user ever doing any changes in git or even knowing that the function exists somewhere in git.

But then, we came across one scenario in our edge cloud deployments where we had certain nodes in a kubernetes cluster dedicated to specific workloads and had a different mtu packet size set on them which had an affect on the usual service/pod networking for those nodes, essentially meaning that our functions must not schedule on those nodes, so we have to make functions stick to some nodes, hence the need for these mechanisms through kn.

Does that make sense merely from the use-case perspective? I am not sure how others are using kn cli, but this is how we are currently using it. Maybe this is something that can be useful for the func's CRD and Operator story?

I wonder how the error message looks like from Serving's webhook. If it propagates a good hint for users why Ksvc creation failed.

For this, I can check and post here, but I think last when I tried, Serving's webhook gave a clear hint around it

Thanks, for the extensive reply to support the usefulness concern. I'm getting a better picture now.

One idea how to address my concern might be introducing "experimental" or "advanced" section in the help message. I.e. to de-clutter current list of kn service flags and split them into sections. To clearly indicate that this flags require additional configuration like adding feature flag to Serving. I'll take a look how to achieve it in spf13/cobra, I recall there are a few options how to create a subsections. And of course descriptive sub-section names (naming game is always hardests part :)).

dsimansk avatar Mar 21 '24 09:03 dsimansk

hello @dsimansk , this is ready for another review - I have reverted the changes done for creating subsections in flags, and added tests.

context: subsection in flag work was reverted because of the pending feature in cobra, more here - https://cloud-native.slack.com/archives/C04LY4SKBQR/p1713175380354749

Shashankft9 avatar Apr 23 '24 13:04 Shashankft9

@Shashankft9 could you pls try to rerun ./hack/build.sh -c that should execute gofmt and fix formatting issues.

dsimansk avatar Apr 24 '24 11:04 dsimansk

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dsimansk, Shashankft9

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

knative-prow[bot] avatar May 03 '24 12:05 knative-prow[bot]

@dsimansk ack, I can work on those improvements in followup PR - thanks!

Shashankft9 avatar May 06 '24 05:05 Shashankft9