nutmilk10

Results 2 issues of nutmilk10

I have 8 Tesla V100 32 GB GPUS, and set tensor_parallel_size tp 8, which should be enough to run meta-llama/Llama-2-70b-chat-hf but I am getting an ``` RuntimeError: CUDA error: out...

### Your current environment ```text from vllm import LLM from unsloth import FastLanguageModel llm, tokenizer = FastLanguageModel.from_pretrained( model_name=model_path, max_seq_length=2048, dtype=torch.float16, load_in_4bit=True, device_map=device ) FastLanguageModel.for_inference(llm) llm = LLM(model=llm) ``` ### How...

usage