Zhuang Liu
Zhuang Liu
## Describe the bug datasets fail to process SQuADv1.1 with max_seq_length=128, doc_stride=96 when calling datasets["train"].train_dataset.map(). ## Steps to reproduce the bug I used huggingface[ TF2 question-answering examples](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/question-answering). And my scripts...
### System Info python version, 3.7 transformers version, 4.26.1 ### Who can help? @ArthurZucker, @younesbelkada ### Information - [ ] The official example scripts - [X] My own modified scripts...
Thanks for the documnent about quantization here: https://github.com/mlcommons/inference_results_v2.1/blob/master/closed/NVIDIA/documentation/calibration.md I'm learning the FP8 part and find that the FP8 quantizaiton equation doesn't make sense to me. As we know, fp8 is...
**Describe the bug** I was training to run sft based on Mixtral-8x7B-instruct model with tensor parallel size=4 (sequence parallel=True) and LoRA (target modules =[all]). It reports that the output dims...