Stijn
Stijn
Also got a VRAM error while ~2GB more available than 9.8GB as shown in terminal when loading Phi-3. Is is possible to put the VRAM limit to `max_available_at_initiating` or something...
It is a 16GB Air M1, do you happen to know a ballpark of the limit? Or is it dynamically dependent of other processes? I was running a Phi-3-128k-mlx mlx_lm.utils...
``` air@MacBook-Air-van-Air test-repo % /opt/homebrew/bin/python3. 10 /Users/air/Repositories/test-repo/test4.py 0 GB 1 GB 2 GB 3 GB 4 GB 5 GB 6 GB 7 GB 8 GB 9 GB libc++abi: terminating due...
Same problem here.
Probably, I have kind of the same problem. The 'generate' function outputs a single key per token, here is some pseudocode for the problem: ``` from transformers import AutoTokenizer tokenizer...
It can be divided by 16, would an implementation for that be complicated to implement?
Same problem here.
Do you have a link to Florence-2?
Nope, thank you!
Thank you for the addition and guide. Does this split the markdown part from the total response correctly, even though there are a lot of xml tags?