Xiang Xu comments

Results 8 comments of


                                            Xiang Xu

RuntimeError: 'weight' must be 2-D while training Flan-T5 models with stage 3

Faced the same issue when run inference using `T5ForConditionalGeneration.from_pretrained()` to load a pre-trained model. Solution: use `trainer.save_model()` instead of `model.save_pretrained()` to save the pre-trained model.

Add Gemma3 VL model

Hi the issue is fixed, verified by `python scripts/llm/gemma3_generate.py`

Add Gemma3 VL model

@suiyoubi Yeah the major two bugs that I fixed were: 1. The local/global layer calculator was wrong 2. The `model.embedding = Gemma3LanguageModelEmbedding(...)` was not executed because of no "pre_process" and...

Add Gemma3 VL model

Hi @suiyoubi nice to talk to you offline. As we discussed, I pushed a patching commit reusing your fix in [#13582](https://github.com/NVIDIA/NeMo/pull/13582) for the VL model importer/exporter. This PR is ready...

Add Gemma3 VL model

Hi @suiyoubi I've resolved your comments. I tested the pretraining convergence using a small subset of the Fineweb dataset, it worked well. I didn't test for VL because I don't...

Add Gemma3 VL model

Hi @suiyoubi For BOS, it's because Gemma3 was PT and IT trained with this leading token. In the tech report [Section 3](https://arxiv.org/pdf/2503.19786), it also indicates the BOS token is explicitly...

[Bugfix] Fix bug of xformer prefill for encoder-decoder

After some deep investigation, I found that the current unit test can pass with the bug is just because an unexpected behavior (or bug) of xformers. Although the K and...

[Bug]: LLMEngine cannot be pickled error vllm 0.6.1.post2

It looks like not a pickle issue, the root cause should be: ``` RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)` ``` Which might be caused by low level invalid memory...