Richard Ginsberg comments

Results 4 comments of


                                            Richard Ginsberg

Ollama v0.1.18+ does not fully unload from GPU when idle

> This should be resolved by #3218 Just tested v0.1.30, the issue is still present. ![ollama-p40-issues-2](https://github.com/ollama/ollama/assets/819865/22751e8b-5af7-445a-bb5f-a7e6291a607f)

Ollama v0.1.18+ does not fully unload from GPU when idle

Above I confirmed the issue persists in v0.1.30. To confirm is wasn't new from v0.1.30, I tried in v0.1.29. Same issue. `docker run -d --gpus=all -v /home/username/ollama:/root/.ollama -p 11434:11434 --name...

fastchat.serve.openai_api_server doesn't work with `stream=true` parameter

fastchat streams output tokens on another endpoint/module. Hoping it was in roadmap to port to fastchat.serve.openai_api_server

Warning `Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.1.0.`

> @flexchar in the file > > https://github.com/m-bain/whisperX/blob/dbeb8617f298bb4b5847d771bfb600379255c860/whisperx/vad.py#L46 > > there is a hash check of the loaded model. > I was able to trace the use of the function...