Stijn

Results 20 comments of Stijn

Also got a VRAM error while ~2GB more available than 9.8GB as shown in terminal when loading Phi-3. Is is possible to put the VRAM limit to `max_available_at_initiating` or something...

It is a 16GB Air M1, do you happen to know a ballpark of the limit? Or is it dynamically dependent of other processes? I was running a Phi-3-128k-mlx mlx_lm.utils...

``` air@MacBook-Air-van-Air test-repo % /opt/homebrew/bin/python3. 10 /Users/air/Repositories/test-repo/test4.py 0 GB 1 GB 2 GB 3 GB 4 GB 5 GB 6 GB 7 GB 8 GB 9 GB libc++abi: terminating due...

Same problem here.

Probably, I have kind of the same problem. The 'generate' function outputs a single key per token, here is some pseudocode for the problem: ``` from transformers import AutoTokenizer tokenizer...

It can be divided by 16, would an implementation for that be complicated to implement?

Do you have a link to Florence-2?

Thank you for the addition and guide. Does this split the markdown part from the total response correctly, even though there are a lot of xml tags?