CMG issues

code/src/model/main_model_2.py, line 790

1

您好，我在阅读您的代码时似乎发现了一个问题，它在 `code/src/model/main_model_2.py` 中的第790行: ``` python for i in unactivated_indices: self.embedding[i] = activated_quantized[random.randint(0,len(activated_indices)-1)] + torch.Tensor(256).uniform_(-1/1024, -1/1024).cuda() ``` 我认为这里应该是 (-1/1024, 1/1024) 而不是(-1/1024, -1/1024) 同样的问题还出现在977行和1152行希望这对您的工作有所帮助 :D

Sacamore

关于训练过程中的lld_loss和mi_loss

7

您好，我尝试follow您的工作，并迁移到其它领域，但是在训练过程中主要遇到了如下几个问题： 1. lld_loss不收敛，导致互信息上界估计不准确，影响训练过程 2. 使用mi_loss之后，模型参数中出现nan 3. mi_loss随着训练过程越来越大我尝试了调整mi_net的层数和学习率等方法，但是问题依然存在。想请教您模型训练中的更多细节： 1. 您的模型在训练过程中，lld_loss是否是逐渐收敛的，还是稳定在一个范围？ 2. 在mi_loss的反向传播中，mi_net的参数是否更新？ 3. mi_loss的训练过程大概如何，是否收敛？

chovyzhang

Train on my own dataset

3

If I'd like to use CMG on my own dataset (for video and audio), how should I prepare the data? I've got video-audio pairs, whether should I extract their features?...

chouliuzuo

model/CPC.py

2

在pretrain.py文件的第599行里与model/CPC.py里的forward函数中40行的传参和98行返回值是不对应的。

RXin-he

self.audio_semantic_decoder and self.Audio_decoder

1

https://github.com/haihuangcode/CMG/blob/2cbdad8f68d6000657ddf45ace97c855c022334d/code/src/model/main_model_2.py#L507C1-L515C60 Hi sir! Thanks for your great work! I have some questions I would like to ask you. I don't know if it's right to understand it this way: self.audio_semantic_decoder...

1090h2400

三种模态的特征序列长度都不同，怎么修改Cross_VQEmbeddingEMA_AVT模块

4

如果audio_feat，video_feat，text_feat的特征序列长度都不同，AVT_VQVAE_Encoder中的self.Cross_quantizer = Cross_VQEmbeddingEMA_AVT(n_embeddings, self.hidden_dim)传播会出错。 v_ph = torch.reshape(v_ph, ((B, T, M))) # [BxT, M] -> [B, T, M] RuntimeError: shape '[16, 99, 400]' is invalid for input of size 236800 Cross_VQEmbeddingEMA_AVT部分怎么修改代码，我想直接用audio_feat，video_feat，text_feat通过AVT_VQVAE_Encoder获取量化后语义对齐的特征表示audio_vq，video_vq，text_vq，进行下游任务。

yhd-123

embedding updated in MM_EMA

4

在main_model_2.py的Cross_VQEmbeddingEMA中，self.embedding更新了三次【self.embedding = self.ema_weight / self.ema_count.unsqueeze(-1)】，但只有最后一次赋值起作用？

RIU-13

Issue with requirements.txt

1

Hi, I've been trying to set up the project using the provided requirements.txt file, but I'm encountering multiple dependency conflicts during the installation process. Specifically, issues with package versions that...

tallabon8

关于CPC

10

你好，感谢你的贡献，关于Cross_CPC，我有几个问题： 1、关于nce损失的计算，nce算的是2.2倍batchsize（CPC.py中的第88-91行），但是最后却只除了1个batchsize。我想知道nce的值是很大的吗？因为，我在自己的数据集上验证，发现初始时这个值很大。 2、由于我自己的数据集没有时间这一维度，因此，我是将A模态的vq作为predictor的输入，预测B模态的vq，然后计算nce。请问这样计算是可行的吗？有别的方法使它们的互信息最大吗？

casm1

Timestamps in the vggsound-avel50k/100K

1

Hi, thank you for your excellent work on “Learning Probabilistic Existence-Nonexistence Evidence for Weakly Supervised Audiovisual Event Perception”. It has helped me tremendously in my research. While trying to access...

HHH123333

CMG
CMG copied to clipboard

Metadata

code/src/model/main_model_2.py, line 790

关于训练过程中的lld_loss和mi_loss

Train on my own dataset

model/CPC.py

self.audio_semantic_decoder and self.Audio_decoder

三种模态的特征序列长度都不同，怎么修改Cross_VQEmbeddingEMA_AVT模块

embedding updated in MM_EMA

Issue with requirements.txt

关于CPC

Timestamps in the vggsound-avel50k/100K

← Metadata

Owner

Metadata

CMG CMG copied to clipboard

Metadata

← Metadata

Owner

Metadata

CMG
CMG copied to clipboard