cortex.cpp icon indicating copy to clipboard operation
cortex.cpp copied to clipboard

bug: nitro cuda windows low performance on machine has multiple GPUs - tested using Jan App

Open hiento09 opened this issue 2 years ago • 3 comments

Describe the bug My windows machine has 3 GPUs, when I enabled all 3 GPUs, the token speed was slow (6-9/s) and it even not able to load tinyllama 1B. When I disabled 2 GPUs, 1 active only, the performance was back to normal

Screenshots

  • 3 GPUs active

    • Low performance image
    • Load tinyllama error image
  • 1 GPU active only, then the performance was back to normal image

Desktop (please complete the following information):

  • OS: Windows 11
  • Nvidia driver: 531.18
  • cuda version: 12.3
  • Nitro version: 0.1.27
  • GPU:
  • 1 RTX 4070ti
  • 2 RTX 1660ti

hiento09 avatar Dec 14 '23 08:12 hiento09

@hiento09 I have a feeling that this problem coming from the communication between different GPUs. I'll look out for this while reading the codebase right now.

KossBoii avatar Dec 15 '23 00:12 KossBoii

@KossBoii that's the exact problem of multiple GPU problem. I tested again on that machine:

  • If using only 4070ti => 55tok/sec
  • If using either 1 out of 2 2 1660ti => 28tok/sec

The distributed inference requires:

  • Good bandwidth between GPUs
  • The discrepancies between multiple GPUs should be not too much (e.g in this case 4070ti have to wait for 1660ti to compute). And also this case uses PCIe3 and 4, not NVlink => The data have to transmitted via CPU to get to another GPU.
  • Explicitly set the value for TP (tensor parallel) in nitro.

It depends but I think the option to use 1 model on a single GPU with the help of CUDA_VISIBLE_DEVICES makes sense in this case (i.e hardware sensing feature)

hiro-v avatar Dec 17 '23 08:12 hiro-v

This should be properly supported with this instead: https://github.com/ggerganov/llama.cpp/pull/6017

hiro-v avatar Mar 22 '24 02:03 hiro-v

closing in favor of tracking this more granularly, now that we have various engines

freelerobot avatar Jul 01 '24 05:07 freelerobot