Sehun Heo
Sehun Heo
Is it clear?
@tyler-romero Thank you for quickly response. I simply used Liger through the `--use_liger_kernel=True` option in the Huggingface trainer. While it is true that Qwen-2.5 uses the same architecture as Qwen-2,...
@DarkLight1337 `MllamaForConditionalGeneration` has additional text layers. For instance, `meta-llama/Llama-3.2-11B-Vision-Instruct`(e.g. https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) model includes additional 9.7B text layers (unlike `meta-llama/Llama-3.1-8B-Instruct`). Therefore, I believe `MllamaForCausalLM` derived from `MllamaForConditionalGeneration` is different from `LlamaForCausalLM`. Is...
Umm, sorry. I didn't understand your workaround. Can i get some examples ?
OK, It seems like that would work. After your implementation, i will edit config and weights of Llama-3.2-vision models (e.g. `meta-llama/Llama-3.2-11B-Vision-Instruct`).
@heheda12345 Oh, that is a good point. I didn't know 8 cross attention layers are not used. So, the additional 9.7B parameters are missing. Thank you. I appreciate your excellent...