Yang Yang comments

Results 10 comments of


                                            Yang Yang

train with update code on synthData

> Hi, may I ask why you calculate the sqrt here

same issue encountered. I found that it is due to the denormal numbers (< 1E-32) in the weights, please refer to https://discuss.pytorch.org/t/conv2d-is-very-slow-on-trained-weights-vs-random-weights/43377/4 BTW, which dataset did you use for training?

ValueError

I just used default values and run tag v1.1 with the command: `python main.py --mode train --input ./data/UTKFace `--output` ./results` I still got the same error.

如何深入理解预处理

要研究具体实现，可以逐步把processor 的输入输出shape 打印出来看。我尝试先给一个最终输出的解释，多帧图片经过processor处理后变成的是一个铺平的patches序列, with shape: (grid_t * grid_h * grid_w, in_channel * temporal_patch_size * patch_size * patch_size) - grid_x 为在x维度上grid的数量，如grid_h = image_height // patch_size - in_channel: 输入图片channel数，default是RGB 为3 - temporal_patch_size:...

如何深入理解预处理

这个序列中patch的顺序还要考虑到三个维度以及spatial merge的操作，以两帧图片为例，每帧图片大小为 (1, 6, 8) 假设in_channel = 1, patch_size = 1, 每个patch即为一个像素点，标记序号如下 `[[[ [ 1 1 2 3 4 5 6 7], [ 8 9 10 11 12 13 14...

关于视频与图像3Dpatch_embed部分论文与代码对应的疑问

有相同的疑问。感觉应该先做conv3d再flatten成token sequence，但是代码实现刚好相反。 raised a thread here to draw more visibility https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/discussions/39

关于视频与图像3Dpatch_embed部分论文与代码对应的疑问

可以看下我放在HF上的问题，从mask 角度看，模型并没有学习到多帧之间的关联，只在两帧组内部进行attention。也就是说无所谓有没有在time dim上进行3D conv，最后还是没关注时间维度。单纯从理论上看，视频理解能力有一定局限性。

关于视频与图像3Dpatch_embed部分论文与代码对应的疑问

``` class FramePatchEmbed(nn.Module): def __init__(self, patch_size, temporal_patch_size, in_channels, embed_dim, spatial_merge_size ): super().__init__() self.patch_size = patch_size self.temporal_patch_size = temporal_patch_size self.in_channels = in_channels self.embed_dim = embed_dim self.spatial_merge_size = spatial_merge_size kernel_size = (temporal_patch_size,...

关于视频与图像3Dpatch_embed部分论文与代码对应的疑问

我看到近期Qwen2.5-VL 相比Qwen2-VL在时间维度上也应用了dynamic resolution的概念，是对这两个issues提及问题的一个改进。

Yang Yang

微信二维码不能用了，能重新发一下吗

train with update code on synthData

Testing too slow

ValueError

如何深入理解预处理

如何深入理解预处理

关于视频与图像3Dpatch_embed部分论文与代码对应的疑问

关于视频与图像3Dpatch_embed部分论文与代码对应的疑问

关于视频与图像3Dpatch_embed部分论文与代码对应的疑问

关于视频与图像3Dpatch_embed部分论文与代码对应的疑问