wangjianyu
wangjianyu
**What is your problem or scenario?** **The Compatibility Problem of Koordinator in KIND Scenario.** **Procedure** _step 1._ Install docker, KIND in local MAC machine. _step 2._ Create a cluster by...
**What happened**: A node has 8 GPU cards, each GPU card has 80 Gi GPU memory. I want to use four cards, each GPU card 40 Gi GPU Memory via...
目前的设计是:不同插件在 Filter 阶段分别判断节点上是否有 Reservation 可以满足资源诉求,但是完全有可能出现一个 Reservation 满足 CPU 但是 GPU 不满足,另一个 Reservation 满足 GPU 但是 CPU 不满足,这种节点会在 Filter 阶段会通过,所以这部分需要重新设计。 目前调度器已经支持了 ReservationNominator 接口,该接口由 Reservation 插件实现,NominateReservation 函数的实现中会调用所有实现了 ReservationFilter、ReservationScore 接口的插件尝试在节点上选出资源条件最满足的 Reservation。 ``` //...
### Ⅰ. Describe what this PR does ### Ⅱ. Does this pull request fix one issue? ### Ⅲ. Describe how to verify it ### Ⅳ. Special notes for reviews ###...
**What is your proposal**: Provide an evolvable End to End Solution for Koordinator Device Management **Why is this needed**: Koordinator already supports two functions in the scheduler: GPU shared scheduling...
### Ⅰ. Describe what this PR does ### Ⅱ. Does this pull request fix one issue? ### Ⅲ. Describe how to verify it ### Ⅳ. Special notes for reviews ###...
### Ⅰ. Describe what this PR does ### Ⅱ. Does this pull request fix one issue? ### Ⅲ. Describe how to verify it ### Ⅳ. Special notes for reviews ###...
**What happened**: DCGM 采用 PodResources 接口暴露 Pod 的 GPU 指标,这依赖 kubelet 的 GPU 分配结果,但是 Koordinator 的 GPU 分配结果是调度器分配的,因此 DCGM 这里会有问题。 **What you expected to happen**: 用户能够通过某种方式看到和 dcgm 一样的指标 **How to...
### Ⅰ. Describe what this PR does 就是为了去掉这个,但是不想加锁,所以这样操作下  ### Ⅱ. Does this pull request fix one issue? ### Ⅲ. Describe how to verify it ### Ⅳ. Special notes for...