Subham Kumar
Results
2
comments of
Subham Kumar
What's the current best option if I have to use this 4bit finetuned model using vLLM inference- Is it to convert it to 16bit and then perform the inference?
Facing the same issue `NotImplementedError: No operator found for memory_efficient_attention_forward `, I am in a sagemaker notebook. Used the below method ``` !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" !pip install --no-deps...