Nguyen Nhi Thanh Tai
Nguyen Nhi Thanh Tai
If anybody run vllm on Triton server Triton server will auto run your llm instance on every possible GPU. So if you have 2 GPU and you run --tensor-parallel-size 2....
Maybe this link will help, https://huggingface.co/docs/transformers/main/en/deepspeed?models=pretrained+model#non-trainer-deepspeed-integration
(1) In my experience, you can run ZeRO 3 with SFTrainer or Trainer (2) I dont use accelerate but I use deepspeed command like this ``` deepspeed train.py ``` You...