mlx-swift-examples chat / KVCache requires re-prepare of media

See also #277

Although KVCache is useful to avoid recomputing state, the full input to the VLM has to be rebuilt each time -- this includes preparing the images and video. It would be nice if we could encapsulate that state somehow.

Apr 23 '25 20:04 davidkoski

I'm writing an LLM server and I want to implement a prompt cache, but I'm having trouble with the Sendable boundary and KVCache / TokenIterator, I'm wondering if you have any suggestions?

I'm thinking ideally the KVCache needs to be stored in the ModelContainer actor or ModelContext?

May 02 '25 09:05 jolonf

@jolonf please make a new issue for this -- we can discuss.

May 02 '25 15:05 davidkoski

I've created an issue here: https://github.com/ml-explore/mlx-swift-examples/issues/310

May 03 '25 03:05 jolonf