DeepSpeed
DeepSpeed copied to clipboard
[REQUEST] Run DeepSpeed inference in C++
Is your feature request related to a problem? Please describe. What is the best way to run DeepSpeed inference in C++?
Describe the solution you'd like Documenting if it is already possible, maybe using TorchScript with custom ops. Otherwise provide a way to run the model in C++.
Describe alternatives you've considered Using TorchScript and adding the ops
Additional context For prod environments we need to run the inference in C# but C++ binding is an acceptable solution