Wang, Chang
Wang, Chang
I also validate chatglm and chatglm3, they are works, could you have fix the root cause? https://huggingface.co/THUDM/chatglm2-6b/discussions/97 python main.py --model hf-causal --model_args pretrained=THUDM/chatglm2-6b,trust_remote_code=True --tasks lambada_openai --limit 10 --batch_size 1 --no_cache
# What does this PR do? Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks...
# What does this PR do? the PR is used to improve weight only quantization config and quantize api with INC 3.0 API. the status is WIP. 1. remove the...
# What does this PR do? woq quantized model saved following the format like https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ/tree/main since itrex v1.4, the quantization config should be added to model.config when model saving and...
## Type of Change Waiting INC support export compressor model. ## Description detail description JIRA ticket: xxx ## Expected Behavior & Potential Risk the expected behavior that triggered by this...
## Type of Change feature or bug fix or documentation or others API changed or not ## Description detail description JIRA ticket: xxx ## Expected Behavior & Potential Risk the...
## Type of Change feature or bug fix or documentation or others API changed or not ## Description detail description JIRA ticket: xxx ## Expected Behavior & Potential Risk the...
I plan to load fp8 model with the following config, Linear is fp8 and kvcache and others op are bf16. ``` FP8Config(allowlist={"types": ["Linear"], "names": []}, blocklist=blocklist = {"types": [], "names":...
### System Info ```shell U22 HABANA G3 ubuntu22.04 pt 2.5.1 v 1.19.0 b 486 ``` ### Information - [X] The official example scripts - [ ] My own modified scripts...