Subham Kumar comments

Repositories
Issues
Comments

Results 2 comments of


                                            Subham Kumar

AWQ support

What's the current best option if I have to use this 4bit finetuned model using vLLM inference- Is it to convert it to 16bit and then perform the inference?

NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:

Facing the same issue `NotImplementedError: No operator found for memory_efficient_attention_forward `, I am in a sagemaker notebook. Used the below method ``` !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" !pip install --no-deps...