Cong Chen
Cong Chen
> Hi, we are aware that some TE implementations won't generate identical results to those of HF (which uses native PyTorch). We use our fused version of operations to speed...
> +1, the same problem. In my cases, the error occurs when training with deepspeed zero3_offload and multi-image inputs (generating completions for GRPO), but it seems ok during evaluation and...
> Yes, that's right [@fushh](https://github.com/fushh), I think it is an issue related to the huggingface version. Hello, what's your version of huggingface? Thanks