COMET icon indicating copy to clipboard operation
COMET copied to clipboard

[QUESTION] Splitting big models over multiple GPUs

Open zouharvi opened this issue 1 year ago • 6 comments

When specifying the number of GPUs during inference, is it only for parallelism or is the model loaded piece-wise over multiple GPUs, if it's bigger than individual GPUs? For example I'd like to use XCOMET-XXL and our cluster has many 12GB GPUs.

At first I thought that the model parts will be loaded onto all GPUs, e.g.:

comet-score -s data/xcomet_ennl.src -t data/xcomet_ennl_T1.tgt --gpus 5 --model "Unbabel/XCOMET-XL"

However I'm getting GPU OOM on the first GPU:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacity of 10.75 GiB of which 11.62 MiB is free. ...
  1. Is it correct that in the above setting the model is being loaded in full 5 times on all 5 GPUs?
  2. Is there a way to split the model over multiple GPUs?

Thank you!

  • unbabel-comet 2.2.1
  • pytorch-lightning 2.2.0.post0
  • torch 2.2.1

zouharvi avatar Mar 05 '24 12:03 zouharvi

same question here

zwhe99 avatar Mar 14 '24 07:03 zwhe99

Last time I check this was not very easy to do with pytorch-lightning.

We actually used a custom made implementation with FSDP to train these larger models (without using pytorch-lightning). I have to double check if the new versions support FSDP better than the currently used pytorch lightning version (2.2.0.post0).

But short answer: model parallelism is not something we are supporting in the current codebase.

ricardorei avatar Mar 14 '24 18:03 ricardorei

idea here. Ctranslate2 just integrated tensor parallelism. It also support XMLRoberta, so just wondering if we could adapt a bit the converter so that we could run the model within CT2 which is very fast. How different is it from XML Roberta at inference ?

vince62s avatar Mar 14 '24 18:03 vince62s

Does it support XLM-R XL? the architecture also differs from XLM-R

ricardorei avatar Mar 14 '24 18:03 ricardorei

It seems like they improved documentation a lot actually: https://lightning.ai/docs/pytorch/stable/advanced/model_parallel/fsdp.html

ricardorei avatar Mar 14 '24 18:03 ricardorei

Does it support XLM-R XL? the architecture also differs from XLM-R

we can adapt if we have the detailed description somewhere. cc @minhthuc2502

vince62s avatar Mar 14 '24 18:03 vince62s