Subham Kumar

Results 2 comments of Subham Kumar

What's the current best option if I have to use this 4bit finetuned model using vLLM inference- Is it to convert it to 16bit and then perform the inference?

Facing the same issue `NotImplementedError: No operator found for memory_efficient_attention_forward `, I am in a sagemaker notebook. Used the below method ``` !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" !pip install --no-deps...