jolonf comments

Results 17 comments of


                                            jolonf

[Survey] Supported Hardwares and Speed

On the iPad Pro 11” with M1 I am getting decode of 10.6 tok/s (I have seen slightly higher and lower). It is running iPadOS 16.1.

Add 'bsv' dependency

Perhaps bsv could be added as a peer dependency?

chat / KVCache requires re-prepare of media

I'm writing an LLM server and I want to implement a prompt cache, but I'm having trouble with the Sendable boundary and KVCache / TokenIterator, I'm wondering if you have...

chat / KVCache requires re-prepare of media

I've created an issue here: https://github.com/ml-explore/mlx-swift-examples/issues/310

Feature: Prompt cache

To be honest I hadn't thought much about a longer lifecycle of the cache. I saw in `cache.py` there is a `save_prompt_cache()` which would allow it to persist across application...

Feature: Prompt cache

> I don't think we want to serialize the KVCache into something that is Sendable. I agree, let me try a few of those suggestions...

Feature: Prompt cache

I think I am getting my head around this. If we use an actor the only way that the `[KVCache]` array can be mutated is with an `inout` parameter and...

I've got an implementation that is working and I've integrated it into the MLXChatExample. The changes were cleaner than I was expecting but there are a few caveats. https://github.com/jolonf/mlx-swift-examples/tree/feature/prompt-caching I've...

Feature: Prompt cache

Just to clarify, the only required change to the project to support caching is adding the `[KVCache]` parameter to the `generate()` functions and adding `isTrimmable()` and `trim()` to `KVCache`/`KVCacheSimple`. Technically...

Feature: Prompt cache

The changes to `Evaluate.swift` and `KVCache.swift` are small and benign, they could be incorporated now. Apps could provide their own `PromptCache` until we include one. Or we could include the...