TensorRT
TensorRT copied to clipboard
[Feature Request] MoE plugin for onnx
There are many models use MoE, for instance:
https://github.com/PKU-YuanGroup/MoE-LLaVA/blob/0549ce0e65119858399d2e4e88ddb4cd3db4c133/moellava/model/language_model/llava_stablelm_moe.py#L483
It would be great if the model can be exported to onnx with custom onnx node, and tensorrt can support such plugin.
TensorRT-LLM has such plugin. Is it possible to make a general MoE plugin for TensorRT?(with out TensorRT-LLM, in line with deepspeed's MoE)
PS. MoE in onnx : https://github.com/microsoft/onnxruntime/blob/884acd4598a437521921dfdec596923afa3f4ed1/docs/ContribOperators.md#commicrosoftmoe
@rajeevsrao ^ ^