lm_head is not converted to QuantLinear with MXFP4/8
lm_head quantization still have some issues.
- need deepcopy if tied_word_embedding = True
- export is not applied for lm_head
Shall we warn user that lm_head is not supported? @WeiweiZhang1 @wenhuach21
BTW, AFAIK, QuantLinear for MXFP4/8 has no forward function and may confuse user about how to use it. Do we plan to support it?
if tied_word_embedding = True, lm-head quant is disabled by default. What's the issue? what do you mean "QuantLinear for MXFP4/8 has no forward function "
If user prefer to quantize lm_head, what's the solution?
how do you run the model, why quantlinear has no forward?
https://github.com/intel/auto-round/blob/8d8a1cd5daaf6e8c71d079eccaec3092fa9af4f1/auto_round/export/export_to_autoround/qlinear_fp.py#L61 It's not implemented in AutoRound
how do you use the model? please attach the cmd. After packing and saving, the model should be reloaded and MXFP4QuantLinear this layer should be called
I'm using the model for export after quantize_and_save() and just aware that AutoRound requires reloading before inference.
I tried Qwen3-8b which is not using tied_word_embeding, the lm_head is still not quantized. I notice the quantization bar contains this op while module replacement is not enabled.
do you enable quant_lm_head or set bits for lm-head
Do we plan to support lm_head for tied_word_embedding=True?
After reloading I saw the lm_head is quantized, not sure what is happening. @WeiweiZhang1 Do you have any comments? Do you think it's a bug or it is designed.
at least need to warn users that xin's way is not supported