OmniQuant
OmniQuant copied to clipboard
Cannot compile with mlc-llm
I quantized a custom fine-tuned llama2 70b model like this.
$ python main.py \
--model /data/finetuned_llama2_70b \
--epochs 20 \
--output_dir /data/finetuned_llama2_70b_output \
--wbits 4 \
--abits 16 \
--group_size 128 \
--lwc \
--net Llama-2-70b
$ python main.py \
--model /data/finetuned_llama2_70b \
--epochs 0 \
--output_dir /data/finetuned_llama2_70b_output2 \
--save_dir /data/finetuned_llama2_70b_omniquant \
--resume /data/finetuned_llama2_70b_output/omni_parameters.pth \
--wbits 4 \
--abits 16 \
--group_size 128 \
--lwc \
--net Llama-2-70b
Then I updated mlc_llm/quantization/__init__.py like this
"w4a16g128asym": QuantizationScheme(
name="w4a16g128asym",
linear_weight=GroupQuantizationSpec(
dtype="float16",
mode="int4",
sym=False,
storage_nbit=16,
group_size=128,
transpose=False,
),
embedding_table=None,
final_fc_weight=None,
)
When I try to compile the model with mlc-llm,
$ python -m mlc_llm.build \
--model /data/finetuned_llama2_70b_omniquant \
--target cuda \
--quantization w4a16g128asym \
--artifact-path /data/finetuned_llama2_70b_omniquant_mlc \
--use-cache 0
I got this error.
Start computing and quantizing weights... This may take a while.
Traceback (most recent call last):
File "~/mlc-llm/mlc_llm/build.py", line 42, in main
core.build_model_from_args(parsed_args)
File "~/mlc-llm/mlc_llm/core.py", line 619, in build_model_from_args
new_params = utils.convert_weights(param_manager, params, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/mlc-llm/mlc_llm/utils.py", line 258, in convert_weights
vm["transform_params"]()
File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
File "~/mambaforge/envs/mlc/lib/python3.11/site-packages/tvm/_ffi/base.py", line 476, in raise_last_ffi_error
raise py_err
File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
File "~/mlc-llm/mlc_llm/relax_model/param_manager.py", line 558, in get_item
for torch_binname in [
^
File "~/mlc-llm/mlc_llm/relax_model/param_manager.py", line 559, in <listcomp>
self.torch_pname2binname[torch_pname] for torch_pname in torch_pnames
~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: 'model.layers.0.self_attn.q_proj.weight'
Same error. Is there any progress on this issue so far? @0x1997
@ChenMnZ Do you have any progress or tips on this? It can be that I successfully loaded and the quant weight in mlc.