Support merge_and_unload for IA3 Adapters with 4-bit and 8bit Quantization models
Feature request
Enable merge_and_unload functionality with ia3 adapters loaded with 4-bit and 8-bit quantization model. Currently, merging fails with an error "Cannot merge ia3 layers when the model is loaded in 4-bit mode"
Motivation
Existing merge_and_unload support excludes 4-bit quantized models with ia3 adapters. Merging ia3 adapters into the base model during 4-bit quantization leverages the size reduction of quantization and simplifies deployment by creating a single, smaller model.This feature aligns with the core advantages of IA3(reduced model size) and 4-bit quantization (efficiency gains), enabling users to fully exploit these optimizations.
Your contribution
While I cannot currently submit a pull request, I'm happy to provide further details, test functionalities after implementation, and assist with documentation updates if needed.