llmaz Will sharing models via hostPath leading to security probelm

At the first glance, because the Models are published by the admins, it maybe ok because the data source is under supervising.

Or is that user need?

Jul 24 '24 08:07 kerthcet

/kind question

Jul 24 '24 08:07 kerthcet

Another problem is about the memory management, or the model size will increasing over time. Have no idea whether hg will manage the cache with LRU or other algos?

Aug 07 '24 03:08 kerthcet

Another concern is people may not want to mount the hostpath, we should provide an option for them. P2P accelerator may help them as well.

Aug 18 '24 15:08 kerthcet

However, the best option would be module weights should be treated as the images as well, the kubelet can help manage the lifecycle of weights, have no idea image volume supports this or not. But some models may stored at object store, they can't be downloaded by containerd or cri-o, we should manage manually, with this concern, seems we should mount them to the hostpath then, and manage them with a new daemon processor.

Aug 18 '24 16:08 kerthcet

Validation TODO:

with image volume, model weights can be cached
with image volume, model weights can be gced

Aug 18 '24 16:08 kerthcet

As mentioned, the namespace isolation is not tackled if we support namespaced models in the future.

Aug 19 '24 03:08 kerthcet

We should able to set the cache size if we want to enable GC mechanism.

Aug 19 '24 03:08 kerthcet

Maybe the first step is switch volume source to PV & PVC, then in the future, if we want to support other filestorages like NFS, it's more convenient.

Aug 19 '24 07:08 kerthcet

Validation TODO:

with image volume, model weights can be cached

with image volume, model weights can be gced

This is true. I think

Aug 19 '24 07:08 kerthcet

Some projects will download models with APIs and schedule Pods to cached nodes, however, I think this is less convenient, because a node can serve several kinds of LLMs, how can we know which model we need. And the lifecycle is binded, like the API object is deleted, the model files will be also deleted. However, this is not similar to images, which is controlled by the memory size threshold, I believe this is the right way because a model file can be reused if user deploy the model again.

Aug 19 '24 07:08 kerthcet

Anyway, we may need a daemonset processor to manage the models and also report the model usage to the node info for scheduling.

Aug 19 '24 07:08 kerthcet

Right now, the hostpath is configured by users, so they should be aware of the risks in advance. Let's close this for now. /close

Jan 23 '25 15:01 kerthcet