Richard Li
Richard Li
 I use the following command to start 2 processes, but the model loaded only on one device. `CUDA_VISIBLE_DEVICES="2,1" torchrun --nproc_per_node 2 example.py --ckpt_dir LLaMA/13B --tokenizer_path LLaMA/tokenizer.model`
The code is mostly modified from [LLaVA](https://github.com/haotian-liu/LLaVA)
你好,注意到33B模型,预训练语料的长度是512。如果在这个基础上进行继续指令finetune,finetune数据长度大于512的话,会有影响吗?
What should I do if I want to use tensor_parallel for a GPTQ quantized model([Llama-2-7b-Chat-GPTQ](https://huggingface.co/4bit/Llama-2-7b-Chat-GPTQ) for examlpe) to inference on 2 or more GPUs? Currently, I am using AutoGPTQ to...