Michal Futrega

Results 3 issues of Michal Futrega

I’ve found that setting max_grad_norm has no effect, and we are not clipping gradients. For verification, I ran convergence with max_grad_norm 1e-9 and saw no difference in eval loss, and...

# What does this PR do ? Add a one line overview of what this PR aims to accomplish. **Collection**: [Note which collection this PR will affect] # Changelog -...

CI
Run CICD

> [!IMPORTANT] > The `Update branch` button must only be pressed in very rare occassions. > An outdated branch is never blocking the merge of a PR. > Please reach...

core