Samit comments

Results 28 comments of


                                            Samit

Problem of TimeUpsamle2x, decoder output frames != encoder input frames

"we consider the first frame of a video to be an image..." I see, the first frame is always encoded from the repeated k-1 1st frames. But for upsampling, the...

Bug report: AttnBlock3D reshape disorder

> Sorry for that. We merge that to fix this bug. thanks. btw, since the computation logic is changed, the model may require re-training.

SVD-T2V weights

+1 Looking forward to the open-source of text2video model

Is it really feasible to train a video dit without inserting temporal transformers or attention modules?

I see. So attention map complexity will be (H*W*T)^2. Is it feasible for long video training? Are there any generation results using the train code? (Loss curve in diffusion model...

feat: Add NaViT

Please supplement README on accuracy and performance compared to ViT

Add RecResizeNormImg in Rec Transform to manage padding and norm in resize, add yaml of crnn for server version [WIP]

Please report the results for crnn server version and upload the checkpoint and mindir.

添加断点续训、checkpoint保存、训练日志保存三种功能，丰富Loss输出信息，边训边验适配eval_start_epoch和eval_interval

Thanks. checkpoint保存：每个epoch结束保存ckpt。这个可选last_k 或者top_k保存策略。

[Model][VLM] Add Qwen2.5-Omni model support (end-to-end full support)

> Thanks for this contribution! As we discussed offline, we'll be carefully reviewing this PR/design and think about how to enable end-to-end support for models like this with vLLM! looking...

[Roadmap] vLLM Roadmap Q4 2025

Should we consider supporting E/P/D disaggregation for large-scale multimodal model serving? It's a beneficial feature for large-batch or encode-compute-heavy MLLM deployment scenarios. https://github.com/vllm-project/vllm/pull/25233

[Feature]: Support Ulysses Sequence Parallelism for Diffusion Models

I think we can support TP and CP for diffusion models at first by re-using the parallelism interfaces in vllm. Then we can verify whether the CP interfaces like `sequence_parallel_chunk`...