tkdcjf159 issues

Repositories
Issues
Comments

Results 2 issues of


                                            tkdcjf159

[BUG] Sequence Parallel(Ulysses) Training Gradient Scaling Issue

When training a language model (LM) with DeepSpeed's Sequence Parallel (Ulysses), it's typical to get a cross-entropy loss for each rank. To compute the gradients accurately, as [I understand it,...

bug

training

evaluation methodology issue

Hello, and thank you for releasing such a great benchmark. I’m opening this issue because I have questions about the evaluation methodology. OCRBench v2 computes accuracy for each category—Recognition, Referring,...