Sehun Heo comments

Results 6 comments of


                                            Sehun Heo

Simple string argument is Converted to integer because input only contains numeric data

Is it clear?

Loss does not drop when using Liger Kernel at Qwen2.5

@tyler-romero Thank you for quickly response. I simply used Liger through the `--use_liger_kernel=True` option in the Huggingface trainer. While it is true that Qwen-2.5 uses the same architecture as Qwen-2,...

[Feature]: supporting MllamaForCausalLM

@DarkLight1337 `MllamaForConditionalGeneration` has additional text layers. For instance, `meta-llama/Llama-3.2-11B-Vision-Instruct`(e.g. https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) model includes additional 9.7B text layers (unlike `meta-llama/Llama-3.1-8B-Instruct`). Therefore, I believe `MllamaForCausalLM` derived from `MllamaForConditionalGeneration` is different from `LlamaForCausalLM`. Is...

[Feature]: supporting MllamaForCausalLM

Umm, sorry. I didn't understand your workaround. Can i get some examples ?

[Feature]: supporting MllamaForCausalLM

OK, It seems like that would work. After your implementation, i will edit config and weights of Llama-3.2-vision models (e.g. `meta-llama/Llama-3.2-11B-Vision-Instruct`).

[Feature]: supporting MllamaForCausalLM

@heheda12345 Oh, that is a good point. I didn't know 8 cross attention layers are not used. So, the additional 9.7B parameters are missing. Thank you. I appreciate your excellent...