[REQUEST] Run DeepSpeed inference in C++

Open dashesy opened this issue 3 years ago • 0 comments

Is your feature request related to a problem? Please describe. What is the best way to run DeepSpeed inference in C++?

Describe the solution you'd like Documenting if it is already possible, maybe using TorchScript with custom ops. Otherwise provide a way to run the model in C++.

Describe alternatives you've considered Using TorchScript and adding the ops

Additional context For prod environments we need to run the inference in C# but C++ binding is an acceptable solution

Jul 25 '22 20:07 dashesy