Is there any examples of using offload feature in GPT/BLOOM/OPT inference?

Open YJHMITWEB opened this issue 2 years ago • 1 comments

Hi, currently in the examples, only linear describes a naive example of offload, in other projects such as opt, bloom, gpt, there is no option for offload. I am wondering how to apply offload to large model inference, and any examples?

Mar 17 '23 18:03 YJHMITWEB

Hi @YJHMITWEB This is technically feasible, but would cause a sharp decline in the inference speed. Therefore, the practical significance is limited, and we currently do not consider it a high priority. Welcome to submit the corresponding proposal or PR to participate in the construction. Thanks.

Mar 20 '23 06:03 binmakeswell