flyte icon indicating copy to clipboard operation
flyte copied to clipboard

[Core feature] Allow different resource configs to ray worker and head

Open ByronHsu opened this issue 2 years ago • 6 comments

Motivation: Why do you think this is important?

Currently, ray workers and head both use the same pod template so that they will be launched with the same pod resources at runtime. However, in some cases, users only want GPU on the worker nodes but not the head. Other times, users want to create two groups: one with CPUd and another with GPUs.

Goal: What should the final outcome look like, ideally?

Users can pass different configs to the ray worker and head.

For example:

ray_config = RayJobConfig(
    head_node_config=HeadNodeConfig(
        requests=Resources(mem="64Gi", cpu="4"),
        limits=Resources(mem="64Gi", cpu="4")
    ),
    worker_node_config=[
        WorkerNodeConfig(
            group_name="cpu-group",
            replicas=4,
            requests=Resources(mem="256Gi", cpu="64"),
            limits=Resources(mem="256Gi", cpu="64"),
        ),
        WorkerNodeConfig(
            group_name="gpu-group",
            replicas=2,
            requests=Resources(mem="480Gi", cpu="60", gpu="2"),
            limits=Resources(mem="480Gi", cpu="60", gpu="2")
        )
    ],
)

Describe alternatives you've considered

no alternative

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

  • [X] Yes

Have you read the Code of Conduct?

  • [X] Yes

ByronHsu avatar Nov 07 '23 18:11 ByronHsu

This looks like a reasonable feature. Thanks for raising it, @ByronHsu !

This would be a great first issue to work on.

eapolinario avatar Nov 09 '23 21:11 eapolinario

Me or @troychiu will be on it

ByronHsu avatar Nov 10 '23 01:11 ByronHsu

This is relevant to other plugins that follow a driver-worker pattern (e.g. ray, spark, kfoperator).

jeevb avatar Nov 15 '23 00:11 jeevb

FYI: created an additional issue wich is very related to this one: https://github.com/flyteorg/flyte/issues/4674

Its extending the idea by also allowing pod specifications.

vkaiser-mb avatar Jan 04 '24 10:01 vkaiser-mb

@ByronHsu Any updates? What do you think about my extension?

vkaiser-mb avatar Jan 15 '24 10:01 vkaiser-mb

We'll regroup in re-prioritize this feature in the coming sprints.

eapolinario avatar Apr 29 '24 17:04 eapolinario