Jorge Diogo
Jorge Diogo
This would really be good thing and would allow structured extraction libraries like [Sibila](https://github.com/jndiogo/sibila) to integrate Ollama as a provider. Even if the JSON Schema to grammar converter has limitations...
In my experience, @jkawamoto approach is a good one, because it frees RAM/CUDA/other memory, even if the Llama object is stuck. I've tried calling ```del llama_model```, but this is not...
For local models, Sibila depends on [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), which is the only component that can issue a Segmentation fault. Try using the latest version of llama-cpp-python and make sure it's using...