jolonf
jolonf
On the iPad Pro 11” with M1 I am getting decode of 10.6 tok/s (I have seen slightly higher and lower). It is running iPadOS 16.1.
Perhaps bsv could be added as a peer dependency?
I'm writing an LLM server and I want to implement a prompt cache, but I'm having trouble with the Sendable boundary and KVCache / TokenIterator, I'm wondering if you have...
I've created an issue here: https://github.com/ml-explore/mlx-swift-examples/issues/310
To be honest I hadn't thought much about a longer lifecycle of the cache. I saw in `cache.py` there is a `save_prompt_cache()` which would allow it to persist across application...
> I don't think we want to serialize the KVCache into something that is Sendable. I agree, let me try a few of those suggestions...
I think I am getting my head around this. If we use an actor the only way that the `[KVCache]` array can be mutated is with an `inout` parameter and...
I've got an implementation that is working and I've integrated it into the MLXChatExample. The changes were cleaner than I was expecting but there are a few caveats. https://github.com/jolonf/mlx-swift-examples/tree/feature/prompt-caching I've...
Just to clarify, the only required change to the project to support caching is adding the `[KVCache]` parameter to the `generate()` functions and adding `isTrimmable()` and `trim()` to `KVCache`/`KVCacheSimple`. Technically...
The changes to `Evaluate.swift` and `KVCache.swift` are small and benign, they could be incorporated now. Apps could provide their own `PromptCache` until we include one. Or we could include the...