FasterTransformer
FasterTransformer copied to clipboard
Supporting for expert parallelism in MoE inference
#743 also mentions this issue. So is there a guiding tutorial about how to use expert parallelism in MoE inference?