llmaz Model aware scheduling

What would you like to be added:

Right now, model management is a tricky problem in the cluster, it's big, so we need to cache them in the node just like images, however, kubelet will take over the image lifecycle management but files, so that's a problem, and will not be tacked in the near future, so maybe we need to manage the models manually and make it aware by the scheduler to make pod placement decisions.

Why is this needed:

Efficient pod scheduling with models

Completion requirements:

This enhancement requires the following artifacts:

[x] Design doc
[ ] API change
[x] Docs update

The artifacts should be linked in subsequent comments.

Aug 19 '24 02:08 kerthcet

/kind feature

Aug 19 '24 02:08 kerthcet

I see that the built-in plugins of scheduler currently include: volumebinding volumerestrictions volumeszone plugins. Do we have scoring plugins similar to imagelocality plugins (e.g. volumelocality) to cover this scenario?

Sep 22 '24 12:09 googs1025

Yes, basically the idea is we have models located at different nodes, and we should be aware of which node is the best candidate.

However, right now, I'm developing a P2P based model distribution project, you can take it as another lightweight dragonfly but most works for model weights. See https://github.com/InftyAI/Manta, once this finished, the model aware is not that urgent, because models will be transmitted across nodes, but still available.

Sep 23 '24 02:09 kerthcet