Support merge_and_unload for IA3 Adapters with 4-bit and 8bit Quantization models

Open Abdullah-kwl opened this issue 1 year ago • 0 comments

Feature request

Enable merge_and_unload functionality with ia3 adapters loaded with 4-bit and 8-bit quantization model. Currently, merging fails with an error "Cannot merge ia3 layers when the model is loaded in 4-bit mode"

Screenshot 2024-05-02 153000

Motivation

Existing merge_and_unload support excludes 4-bit quantized models with ia3 adapters. Merging ia3 adapters into the base model during 4-bit quantization leverages the size reduction of quantization and simplifies deployment by creating a single, smaller model.This feature aligns with the core advantages of IA3(reduced model size) and 4-bit quantization (efficiency gains), enabling users to fully exploit these optimizations.

Your contribution

While I cannot currently submit a pull request, I'm happy to provide further details, test functionalities after implementation, and assist with documentation updates if needed.

May 02 '24 12:05 Abdullah-kwl