Yan Li
Yan Li
cuda compute architecture: sm_75 gcc version: 9.3.0 nvcc version: 11.6.55 ``` /root/repos/RAJA-PERFSUITE/RAJAPerf/src/apps/HALOEXCHANGE_FUSED-OMP.cpp: In member function 'virtual void rajaperf::apps::HALOEXCHANGE_FUSED::runOpenMPVariant(rajaperf::VariantID, size_t)': /root/repos/RAJA-PERFSUITE/RAJAPerf/src/apps/HALOEXCHANGE_FUSED-OMP.cpp:132:326: error: invalid application of 'sizeof' to incomplete type 'pack_lambda_type' {aka...
#743 also mentions this issue. So is there a guiding tutorial about how to use expert parallelism in MoE inference?
Thanks for your great efforts first. I read the PR you opened in the TensorRT-LLM repo and noticed that EP +TP, PP + TP, and TP are supported during inference....