xxtars
xxtars
Hello, I would like to ask for assistance in solving a problem I've encountered. I am currently training a MLLM with DeepSpeed, and I've introduced an additional modality to the...
**Describe the bug** when train [llama-vid](https://github.com/dvlab-research/LLaMA-VID) (stage2, full-finetuning LLaMA) using deepspeed==0.14.0, and transformers trainer, grad_norm will be nan (or 1.414, with smaller lr, pink line) and loss will be 0...
Hello, Thank you for releasing the code. I'm excited to work with it, but I've encountered some issues while trying to replicate the work. I noticed that in the [datasets.py](https://github.com/whwu95/ATM/blob/main/datasets.py),...