nutmilk10 issues

Repositories
Issues
Comments

Results 2 issues of


                                            nutmilk10

OOM with meta-llama/Llama-2-70b-chat-hf

I have 8 Tesla V100 32 GB GPUS, and set tensor_parallel_size tp 8, which should be enough to run meta-llama/Llama-2-70b-chat-hf but I am getting an ``` RuntimeError: CUDA error: out...

[Usage]: Can VLLM be used with a fine-tuned unsloth model

### Your current environment ```text from vllm import LLM from unsloth import FastLanguageModel llm, tokenizer = FastLanguageModel.from_pretrained( model_name=model_path, max_seq_length=2048, dtype=torch.float16, load_in_4bit=True, device_map=device ) FastLanguageModel.for_inference(llm) llm = LLM(model=llm) ``` ### How...

usage