Xin He
Xin He
## Type of Change feature ## Description - [x] support per channel quantization for higher accuracy - [x] add observer registry for easy extension - [x] dump scale_inv from observer...
https://github.com/intel-innersource/frameworks.ai.pytorch.ipex-cpu/issues/2404
# What does this PR do? Fix performance issue in mistral Without this fix, the first token generation will take more time and the performance is bad. ## Before submitting...
Hi @casper-hansen, For general GEMM quant type, I observe that the `qweight `shape of AutoAWQ and AutoGPTQ is different due to different pack dimension. I'm confused about why we introduce...
Hi, folks, I met some weird issue when reproducing the results shown in paper. I can get results below with GPU visible, but cannot reproduce it with only CPU. I...
## Type of Change feature ## Description - [x] implement `incbench` command as entrypoint for ease-of-use benchmark - [x] automatically check numa/socket info and dump it with table for ease-of-understand...
## Type of Change bug fix ## Description fix bf16 symbolic_trace bug, 1. cause abnormal recursive calling. 2. missing necessary attributes By moving BF16 fallback ahead of quantization and removing...
lm_head quantization still have some issues. - need deepcopy if tied_word_embedding = True - export is not applied for lm_head Shall we warn user that lm_head is not supported? @WeiweiZhang1...