Yanyang LI comments

Results 3 comments of


                                            Yanyang LI

Fix llama meta tensor loading, model tensor parallelism inference

Hi, I am trying to use your PR to run LLaMA-65B. How should I do this? Directly using `LlamaForCausalLM.from_pretrained` and launching with `deepspeed --num_gpus 8` seems to consume a lot...

Fix llama meta tensor loading, model tensor parallelism inference

> @lyy1994 You can refer to https://github.com/microsoft/DeepSpeedExamples/blob/master/inference/huggingface/text-generation/inference-test.py#L24 for how to use meta tensor by checking how this example uses variable `use_meta_tensor`. I doubt that whether it is compatible with the...

Fix llama meta tensor loading, model tensor parallelism inference

> > @lyy1994 You can refer to https://github.com/microsoft/DeepSpeedExamples/blob/master/inference/huggingface/text-generation/inference-test.py#L24 for how to use meta tensor by checking how this example uses variable `use_meta_tensor`. I doubt that whether it is compatible with...