llmaz icon indicating copy to clipboard operation
llmaz copied to clipboard

Accelerate model loading

Open kerthcet opened this issue 1 year ago • 2 comments

What would you like to be added:

Generally,

  • if user use object stores, they can use fluid as distributed caching system
  • if user use oci images, they can use dragonfly for p2p accelerating

However, there are 2 gaps here:

  • what about user download model weights from model hub, we should only download once and then using the cache, but we can't achieve this today. One way is download the model weights to the file system or cache system
  • However this bring another problem, what if people don't have these two additional components, we should still have ways to accelerate the model loading by default.

Why is this needed:

Minimum the configurations but still enjoy the accelerating.

Completion requirements:

This enhancement requires the following artifacts:

  • [ ] Design doc
  • [ ] API change
  • [ ] Docs update

The artifacts should be linked in subsequent comments.

kerthcet avatar Aug 21 '24 16:08 kerthcet

/kind feature

kerthcet avatar Aug 21 '24 16:08 kerthcet

/priority important-soon

kerthcet avatar Aug 21 '24 16:08 kerthcet

/close duplicated as https://github.com/InftyAI/llmaz/issues/119

kerthcet avatar Nov 14 '24 09:11 kerthcet