ZHENG, Zhen comments

Repositories
Issues
Comments

Results 1 comments of


                                            ZHENG, Zhen

[BUG] Issue serving Mixtral 8x7B on H100

> @JamesTheZ may know about this. Seems because the current implementation only compiles `cuda_linear_kernels.cpp` on Ampere: https://github.com/microsoft/DeepSpeed/blob/330d36bb39b8dd33b5603ee0024705db38aab534/op_builder/inference_core_ops.py#L75-L81