Yanyang LI
Yanyang LI
Hi, I am trying to use your PR to run LLaMA-65B. How should I do this? Directly using `LlamaForCausalLM.from_pretrained` and launching with `deepspeed --num_gpus 8` seems to consume a lot...
> @lyy1994 You can refer to https://github.com/microsoft/DeepSpeedExamples/blob/master/inference/huggingface/text-generation/inference-test.py#L24 for how to use meta tensor by checking how this example uses variable `use_meta_tensor`. I doubt that whether it is compatible with the...
> > @lyy1994 You can refer to https://github.com/microsoft/DeepSpeedExamples/blob/master/inference/huggingface/text-generation/inference-test.py#L24 for how to use meta tensor by checking how this example uses variable `use_meta_tensor`. I doubt that whether it is compatible with...