Zeqian Ju
Zeqian Ju
Hi! At the moment,``SparseAttention`` class inherits from ``Attention`` but it does not support cache mechanism. I guess it is the reason of the ``unexpected keyword argument`` bug. ## CODE ``dalle_pytorch/attention.py``...
The required files in [AudiocaptionLoss config](https://github.com/yangdongchao/Text-to-sound-Synthesis/blob/master/Codebook/AudiocaptionLoss/settings/settings.yaml#L37) are missing. ``` path: vocabulary: 'data/pickles/words_list.p' encoder: 'pretrained_models/audioset_deit.pth' # 'pretrained_models/deit.pth' word2vec: 'pretrained_models/word2vec/w2v_512.model' eval_model: 'pretrained_models/ACTm.pth' ```
Hi, could you please share some caption examples for pretraining on Audioset? I'm a little confused about the [mask] token setting for clip text encoder.