EnergonAI
EnergonAI copied to clipboard
Is there any examples of using offload feature in GPT/BLOOM/OPT inference?
Hi, currently in the examples, only linear describes a naive example of offload, in other projects such as opt, bloom, gpt, there is no option for offload.
I am wondering how to apply offload to large model inference, and any examples?
Hi @YJHMITWEB This is technically feasible, but would cause a sharp decline in the inference speed. Therefore, the practical significance is limited, and we currently do not consider it a high priority. Welcome to submit the corresponding proposal or PR to participate in the construction. Thanks.