chat / KVCache requires re-prepare of media
See also #277
Although KVCache is useful to avoid recomputing state, the full input to the VLM has to be rebuilt each time -- this includes preparing the images and video. It would be nice if we could encapsulate that state somehow.
I'm writing an LLM server and I want to implement a prompt cache, but I'm having trouble with the Sendable boundary and KVCache / TokenIterator, I'm wondering if you have any suggestions?
I'm thinking ideally the KVCache needs to be stored in the ModelContainer actor or ModelContext?
@jolonf please make a new issue for this -- we can discuss.
I've created an issue here: https://github.com/ml-explore/mlx-swift-examples/issues/310