gemma_pytorch Error when running Gemma inference on GPU

When I run

docker run -t --rm \
    --gpus all \
    -v ${CKPT_PATH}:/tmp/ckpt \
    ${DOCKER_URI} \
    python scripts/run.py \
    --device=cuda \
    --ckpt=/tmp/ckpt \
    --variant="${VARIANT}" \
    --prompt="${PROMPT}"

It returns the error: docker: Error response from daemon: could not select device drit device driver "" with capabilities: [[gpu]].

while if I run on CPU with command:

docker run -t --rm \
    -v ${CKPT_PATH}:/tmp/ckpt \
    ${DOCKER_URI} \
    python scripts/run.py \
    --ckpt=/tmp/ckpt \
    --variant="${VARIANT}" \
    --prompt="${PROMPT}"

It works out OK.

Mar 18 '24 12:03 LarryHawkingYoung

What model variant did you use and what GPU did you use?

One guess is that you may run out of GPU memory if you try to run the 7B un-quantized model on a 16GB GPU. You can either try the 7B quantized model or a 2B model and it should work.

Mar 26 '24 16:03 pengchongjin

Hi @LarryHawkingYoung,

Could you please confirm if this issue is resolved for you with the above comment ? Please feel free to close the issue if it is resolved ?

Thank you.

Sep 25 '24 05:09 Gopi-Uppari

Hi @LarryHawkingYoung,

Closing this issue due to lack of recent activity, Please feel free reopen if this is still a valid request.

Thank you!

Oct 07 '24 05:10 Gopi-Uppari