HAMi icon indicating copy to clipboard operation
HAMi copied to clipboard

feat: Add GPU node selector to scheduler deployment

Open haitwang-cloud opened this issue 1 year ago • 1 comments

The code changes in deployment.yaml add the GPU node selector to the scheduler deployment. This change allows the scheduler to select nodes with the gpu label set to "on". The environment variables are updated accordingly to include the NODE_SELECTOR_GPU variable.

What type of PR is this?

/kind feature

What this PR does / why we need it: This PR introduces a new feature to enhance our scheduler by allowing it to filter BM (Bare Metal) nodes based on a node selector. This is particularly useful in our cluster environment where only BM nodes have GPUs, ensuring that tasks requiring GPUs are scheduled appropriately.

Which issue(s) this PR fixes: Fixes # https://github.com/Project-HAMi/HAMi/issues/375

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

haitwang-cloud avatar Jul 08 '24 07:07 haitwang-cloud

@wawa0210 @archlitchi

haitwang-cloud avatar Jul 09 '24 02:07 haitwang-cloud

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: haitwang-cloud Once this PR has been reviewed and has the lgtm label, please assign wawa0210 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

hami-robott[bot] avatar Jul 17 '24 08:07 hami-robott[bot]

There will be an extra env set in the vgpu-scheduler-extender is we enable the vmBmMixMode

        - name: vgpu-scheduler-extender
          image: projecthami/hami:v2.3.12
          imagePullPolicy: "IfNotPresent"
          env:
            - name: NODE_SELECTOR_GPU
              value: "on"

haitwang-cloud avatar Jul 17 '24 08:07 haitwang-cloud

Fix the bug in this pull request: https://github.com/Project-HAMi/HAMi/pull/354, as it is causing the scheduler to restart.

haitwang-cloud avatar Jul 19 '24 02:07 haitwang-cloud

/lgtm

archlitchi avatar Jul 22 '24 09:07 archlitchi