TensorRT-LLM
TensorRT-LLM copied to clipboard
Mixtral 8x7B smoothquant failed
System Info
p4de (4 80GB A100 GPUs)
Who can help?
@Tracin @byshiue
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
- Install 0.9.0 release 2.python ./llama/convert_checkpoint.py --model_dir ${MODEL_DIR} --output_dir ./ckpt --dtype float16 --tp_size 8 --workers 8 --smoothquant 0.5
Expected behavior
Checkpoint created
actual behavior
Traceback (most recent call last):
in convert_layer qkv_weight = qkv_para[prefix + 'self_attn.qkv_proj'] KeyError: 'model.layers.0.self_attn.qkv_proj'
additional notes
n/a
@vnkc1 Mixtral MoE Smoothquant is not currently supported as noted in the precision support matrix
Thanks, is there a plan to support MoE Smoothquant?
It is not in our roadmap now. If you are interested, you could create a ticket to request it. Close this issue.