Running on GPU way slower than running on CPU
LocalAI version: Latest (v1.25.0)
Environment, CPU architecture, OS, and Version: GPU : NVIDIA GeForce MX250 (9.9 GB) CPU : 15.8 GB
Describe the bug
I tried running LocalAI using flag --gpus all :
docker run -ti --gpus all -p 8080:8080 -e "DEBUG=true" -e "THREADS=14" -e "REBUILD=false" -v MODEL_PATH:/build/models:cached quay.io/go-skynet/local-ai:v1.25.0-cublas-cuda11
For the same message chat completion, using GPU takes 34 minutes, using CPU takes only 5 minutes
Expected behavior Using GPU is faster
Logs
I notice something interesting with this log messages
WARNING: failed to allocate 1024.00 MB of pinned memory: out of memory
WARNING: failed to allocate 512.00 MB of pinned memory: out of memory
Question
- What is happen?
- Is there any way to config "pinned memory"?
LocalAI version: Latest (v1.25.0)
Environment, CPU architecture, OS, and Version: GPU : NVIDIA GeForce MX250 (9.9 GB) CPU : 15.8 GB
Describe the bug I tried running LocalAI using flag
--gpus all:docker run -ti --gpus all -p 8080:8080 -e "DEBUG=true" -e "THREADS=14" -e "REBUILD=false" -v MODEL_PATH:/build/models:cached quay.io/go-skynet/local-ai:v1.25.0-cublas-cuda11For the same message chat completion, using GPU takes 34 minutes, using CPU takes only 5 minutesExpected behavior Using GPU is faster
Logs I notice something interesting with this log messages
WARNING: failed to allocate 1024.00 MB of pinned memory: out of memoryWARNING: failed to allocate 512.00 MB of pinned memory: out of memoryQuestion
- What is happen?
- Is there any way to config "pinned memory"?
Can you please post your models yaml file?
In case you use Windows as a Docker Host, make sure your models live inside of the linux fs and not the windows fs. The Transferrate from Windows as a File System host is really slow if you want to run stuff inside of wsl.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.
I am going to close this issue for now, please tag me if this is in error