LocalAI icon indicating copy to clipboard operation
LocalAI copied to clipboard

Running on GPU way slower than running on CPU

Open sandriansandy opened this issue 2 years ago • 3 comments

LocalAI version: Latest (v1.25.0)

Environment, CPU architecture, OS, and Version: GPU : NVIDIA GeForce MX250 (9.9 GB) CPU : 15.8 GB

Describe the bug I tried running LocalAI using flag --gpus all : docker run -ti --gpus all -p 8080:8080 -e "DEBUG=true" -e "THREADS=14" -e "REBUILD=false" -v MODEL_PATH:/build/models:cached quay.io/go-skynet/local-ai:v1.25.0-cublas-cuda11 For the same message chat completion, using GPU takes 34 minutes, using CPU takes only 5 minutes

Expected behavior Using GPU is faster

Logs I notice something interesting with this log messages WARNING: failed to allocate 1024.00 MB of pinned memory: out of memory WARNING: failed to allocate 512.00 MB of pinned memory: out of memory

Question

  1. What is happen?
  2. Is there any way to config "pinned memory"?

sandriansandy avatar Sep 11 '23 10:09 sandriansandy

LocalAI version: Latest (v1.25.0)

Environment, CPU architecture, OS, and Version: GPU : NVIDIA GeForce MX250 (9.9 GB) CPU : 15.8 GB

Describe the bug I tried running LocalAI using flag --gpus all : docker run -ti --gpus all -p 8080:8080 -e "DEBUG=true" -e "THREADS=14" -e "REBUILD=false" -v MODEL_PATH:/build/models:cached quay.io/go-skynet/local-ai:v1.25.0-cublas-cuda11 For the same message chat completion, using GPU takes 34 minutes, using CPU takes only 5 minutes

Expected behavior Using GPU is faster

Logs I notice something interesting with this log messages WARNING: failed to allocate 1024.00 MB of pinned memory: out of memory WARNING: failed to allocate 512.00 MB of pinned memory: out of memory

Question

  1. What is happen?
  2. Is there any way to config "pinned memory"?

Can you please post your models yaml file?

lunamidori5 avatar Sep 15 '23 04:09 lunamidori5

In case you use Windows as a Docker Host, make sure your models live inside of the linux fs and not the windows fs. The Transferrate from Windows as a File System host is really slow if you want to run stuff inside of wsl.

MrKinauJr avatar Sep 27 '23 07:09 MrKinauJr

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Dec 03 '25 02:12 github-actions[bot]

I am going to close this issue for now, please tag me if this is in error

lunamidori5 avatar Dec 03 '25 17:12 lunamidori5