Anton Vlasjuk

Results 7 issues of Anton Vlasjuk

# What does this PR do? Introduces new tests that check if a data collator might introduce side effects, i.e. the given input changes after the call to the collator....

# What does this PR do? It only addresses typos and a wrong shape annotation in the comments of `mamba`'s slow forward call. There's no change in the logic or...

# What does this PR do? See #31707 for a detailed rundown. Fixes #31707 Tl;dr: Galore still has issues displaying the correct lr due to the lr scheduler this time....

# What does this PR do? Basically a continuation of #32677 which implements the fixes for Jamba this time. Batched generation tests might need to be changed, especially the logits,...

# What does this PR do? Extends the Mamba2 conversion script to be compatible with the paper models and codestral. I need some help handling the tokenizer or more specifically...

Continuation of #39228 for the VL models Current inference script for testing (torch 2.9.1): ```python import requests from PIL import Image from transformers import AutoConfig, AutoModelForImageTextToText, AutoProcessor model_path = "/raid/anton/code/forks/transformers/AntonV/ErnieVL"...