llmaz Accelerate model loading

What would you like to be added:

Generally,

However, there are 2 gaps here:

what about user download model weights from model hub, we should only download once and then using the cache, but we can't achieve this today. One way is download the model weights to the file system or cache system
However this bring another problem, what if people don't have these two additional components, we should still have ways to accelerate the model loading by default.

Why is this needed:

Minimum the configurations but still enjoy the accelerating.

Completion requirements:

This enhancement requires the following artifacts:

The artifacts should be linked in subsequent comments.

Aug 21 '24 16:08 kerthcet

/kind feature

Aug 21 '24 16:08 kerthcet

/priority important-soon

Aug 21 '24 16:08 kerthcet

/close duplicated as https://github.com/InftyAI/llmaz/issues/119

Nov 14 '24 09:11 kerthcet