CTranslate2
CTranslate2 copied to clipboard
tensor parallel by nccl + mpi
WIP for the feature tensor parallel. There are some points to investigate:
- Make new version of converter to move forward the number heads before the appearance of weight, bias in self attention to deal with group query attention.
- Packaging python wrapper: how to deal with MPI and NCCL when packaging
LGTM. It helps me a lot. I'm looking forward to seeing the full release version.