flyte icon indicating copy to clipboard operation
flyte copied to clipboard

Use serviceAccount for Ray from the propeller configuration

Open peterghaddad opened this issue 1 year ago • 2 comments

Tracking issue

Why are the changes needed?

Currently, the serviceAccount pulls from the securityContext of the k8s cluster that the FlytePropeller is deployed at. Instead, have the option to configure the propeller to use a specific serviceAccountName on k8s when the configuration value is set to a value other than "default".

What changes were proposed in this pull request?

How was this patch tested?

Setup process

Screenshots

Check all the applicable boxes

  • [ ] I updated the documentation accordingly.
  • [ ] All new and existing tests passed.
  • [ ] All commits are signed-off.

Related PRs

Docs link

peterghaddad avatar Oct 07 '24 20:10 peterghaddad

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 36.35%. Comparing base (b9900fe) to head (385c34e). Report is 199 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5821   +/-   ##
=======================================
  Coverage   36.35%   36.35%           
=======================================
  Files        1304     1304           
  Lines      110147   110147           
=======================================
+ Hits        40042    40045    +3     
+ Misses      65938    65935    -3     
  Partials     4167     4167           
Flag Coverage Δ
unittests-datacatalog 51.37% <ø> (ø)
unittests-flyteadmin 55.60% <ø> (ø)
unittests-flytecopilot 12.17% <ø> (ø)
unittests-flytectl 62.26% <ø> (+0.04%) :arrow_up:
unittests-flyteidl 7.17% <ø> (ø)
unittests-flyteplugins 53.35% <100.00%> (ø)
unittests-flytepropeller 42.02% <ø> (ø)
unittests-flytestdlib 55.37% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Oct 07 '24 20:10 codecov[bot]

In flyteplugins/go/tasks/plugins/k8s/ray/ray.go, I noticed that submitterPodTemplate := buildSubmitterPodTemplate(headPodSpec, objectMeta, taskCtx) uses the headPodSpec defined early in the source. See here. Even when the configuration is changed it doesn't propagate down when calling podSpec.ServiceAccountName = cfg.ServiceAccount

Should we be using the built out headGroupSpec instead of the one originally defined? See here.

@eapolinario any thoughts here? We also observed that modifying the serviceAccount through the podTemplate in the Flyte sdk does not propagate.

peterghaddad avatar Oct 08 '24 13:10 peterghaddad

Even when the configuration is changed it doesn't propagate down when calling podSpec.ServiceAccountName = cfg.ServiceAccount

The issue is that the call to GetServiceAccountNameFromTaskExecutionMetadata always gets the security context from task execution metadata, which has the run_as set.

So even though we set the service account in https://github.com/flyteorg/flyte/blob/ffd72a09cb09079beccec5908ebd45b430ff81c5/flyteplugins/go/tasks/plugins/k8s/ray/ray.go#L119 this ends up being overridden in the call to constructRayJob.

As you're proposing in this PR, the crux of the change should be around https://github.com/flyteorg/flyte/blob/ffd72a09cb09079beccec5908ebd45b430ff81c5/flyteplugins/go/tasks/plugins/k8s/ray/ray.go#L192-L195 What if we used only used the service account name from the config if that set at all (i.e. instead of relying on setting its default value to "default" make that "") and if that's not set then we call serviceAccountName = flytek8s.GetServiceAccountNameFromTaskExecutionMetadata(taskCtx.TaskExecutionMetadata()).

@peterghaddad , wdyt?

Also, cc: @pingsutw

eapolinario avatar Oct 25 '24 19:10 eapolinario

Even when the configuration is changed it doesn't propagate down when calling podSpec.ServiceAccountName = cfg.ServiceAccount

The issue is that the call to GetServiceAccountNameFromTaskExecutionMetadata always gets the security context from task execution metadata, which has the run_as set.

So even though we set the service account in

https://github.com/flyteorg/flyte/blob/ffd72a09cb09079beccec5908ebd45b430ff81c5/flyteplugins/go/tasks/plugins/k8s/ray/ray.go#L119

this ends up being overridden in the call to constructRayJob. As you're proposing in this PR, the crux of the change should be around

https://github.com/flyteorg/flyte/blob/ffd72a09cb09079beccec5908ebd45b430ff81c5/flyteplugins/go/tasks/plugins/k8s/ray/ray.go#L192-L195

What if we used only used the service account name from the config if that set at all (i.e. instead of relying on setting its default value to "default" make that "") and if that's not set then we call serviceAccountName = flytek8s.GetServiceAccountNameFromTaskExecutionMetadata(taskCtx.TaskExecutionMetadata()). @peterghaddad , wdyt?

Also, cc: @pingsutw

I like this approach. I'll submit a PR for this change.

peterghaddad avatar Oct 27 '24 19:10 peterghaddad