DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[REQUEST] Code sample to use DeepSpeed inference without having to run deepspeed cmd line in production set

Open dhawalkp opened this issue 4 years ago • 1 comments

Is your feature request related to a problem? Please describe. The current examples for DeepSpeed inference uses cmd line 'deepspeed' that internally uses launcher modules of deepspeed to initialize the NCCL/MPI backends and discovers the ranks, and world size etc. While using command line to run inference is good for the developer, but not useful for production use. Most of the production code uses serving stack to serve the inference real-time and uses python packages to initialize the frameworks like deepspeed and can't rely on running cmd line executions for each inference request.

Describe the solution you'd like Provide a clear example of using deepspeed inference w/o using command line 'deepspeed'. Show in that example how to initialize the backend by using deepspeed.init_distributed() and call deepspeed.init_inference() calls and get rid of having to run the command line. I called deepspeed.init_distributed() in my code but the backend always fails to initilize even though I set all the right env variables for RANK, LOCAL_RANK, WORLD_SIZE etc.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

dhawalkp avatar Jan 17 '22 16:01 dhawalkp

I also agree with you ? @dhawalkp have you found any solution for this

tahercoolguy avatar Aug 11 '22 07:08 tahercoolguy

@tahercoolguy @dhawalkp Even I'm looking for a way to initialise an inference engine without using the command line. Also, Were you guys able to distribute the model into multiple GPUs?

When i pass --num-gpus it just spawns the number of process for each GPU and run the script based on that. Let me know if you guys were able to get a single inference when the model was distributed into multiple GPUs.

sanxchep avatar Sep 20 '22 07:09 sanxchep

@dhawalkp @sanxchep Did you ever find a solution for this?

ChrisStormm avatar Jun 26 '24 16:06 ChrisStormm

@ChrisStormm, @dhawalkp, @sanxchep how about the following HF integration example https://github.com/huggingface/transformers-bloom-inference/blob/main/bloom-inference-scripts/bloom-ds-inference.py

tjruwase avatar Jun 26 '24 16:06 tjruwase