OmniQuant
OmniQuant copied to clipboard
Lazy loading
can you guys implement it on the app mlcchat as llama.cpp? cause in low ram devices it crashes instantly when trying to generate text