SCD-Net
SCD-Net copied to clipboard
[CVPR23] A cascaded diffusion captioning model with a novel semantic-conditional diffusion process that upgrades conventional diffusion model with additional semantic prior.
Hello, when I was training 1_train_xe.sh, there was no error 446093.npz, how can I solve this problem? Previous address path... /open_source_dataset/mscoco_dataset/features/up_down The data can be queried. 
您好,我想知道mscoco_clip_ret_sents.pkl 的参数具体代表什么?"r_token_ids" 是保存cross-modal retrieval model 产生的20个句子吗?cross-modal retrieval model 是用的CLIP吗?谢谢
Hello! I have a question: since the diffusion model generates sentences starting from random noise, the generated sentences should reflect diversity. Have you conducted any experience about diversity?
[08/31 09:51:13 xl.utils.even [zipfile.BadZipFile File is not a zip file.txt](https://github.com/jianjieluo/SCD-Net/files/12481420/zipfile.BadZipFile.File.is.not.a.zip.file.txt) ts]: eta: 22:35:08 iter: 2839 total_loss: 2.535 MSE loss(U): 0.3188 LabelSmoothing(G) loss: 2.217 time: 0.4365 data_time: 0.2023 lr: 4.4375e-05 max_mem:...
Hi @jianjieluo , Thanks for the amazing work! I am working on transferring the architecture of SCD-Net upon medical radiology reports generation. But I figure out that the trained model...
Hello, I would like to ask the cross-modal model to retrieve semantically related sentences from the training sentence pool. How is the training sentence pool obtained? Thank you very much!...