semantic-kernel Python: Unable to quantize huggingface models while creating service using semantic kernel.

I want to use 4-bit quantized mistral model in huggingface with semantic kernel so that I can run it on google colab free tier. But I am not able to find a way to pass this configuration while creating the service. This is the code I am using for creating a service :
kernel = Kernel() text_service_id = 'mistralai/Mistral-7B-Instruct-v0.2' kernel.add_service( service=HuggingFaceTextCompletion( task="text-generation", service_id=text_service_id, ai_model_id=text_service_id, ) ) Please provide me with the solution so that I can pass 4 bit config using bitsandbytes or load_in_4bit=True or whatever else.

May 23 '24 06:05 sadaf0714

any updates??

May 27 '24 09:05 sadaf0714

@sadaf0714, are you able to share what type of configuration you're wanting to set? Is it just load_in_4bit=True? If possible are you able to share the link to the docs for this specific use case.

May 28 '24 15:05 madsbolaris

@matthewbolanos quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, ) model_4bit = AutoModelForCausalLM.from_pretrained( "mistralai/Mistral-7B-Instruct-v0.1", device_map="auto",quantization_config=quantization_config, ), somehow i want to pass this quantization_config while creating service in kernel so that i can run it on google colab free tier.

May 29 '24 11:05 sadaf0714

any updates? @matthewbolanos

Jun 03 '24 06:06 sadaf0714

@eavanvalkenburg Would you take a look at this. Thanks.

Jun 10 '24 15:06 alliscode

@sadaf0714 You should be able to get it working by passing model_kwargs={"load_in_4bit": True} to the HugginFaceTextCompletion constructor, I'm working on making a sample for that, and I might add support for a different way as well, but let me see first. Let me know how that goes! (BTW I had to manually install bitsandbytes package to get it working)

Jun 21 '24 09:06 eavanvalkenburg

@sadaf0714 I have created a sample, but need to do some work on it, have a look and see if you can do the same and get it working!

Jul 01 '24 14:07 eavanvalkenburg

Hey @eavanvalkenburg I have some free time for the upcoming month. Do you need any help with this issue?

Nov 13 '24 22:11 rewrlution

Closing this for now, if still needed we can reopen

Feb 24 '25 13:02 eavanvalkenburg