DiffuSeq icon indicating copy to clipboard operation
DiffuSeq copied to clipboard

[ICLR'23] DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models

Results 46 DiffuSeq issues
Sort by recently updated
recently updated
newest added

Thank you @summmeer for sharing the code! I wanted to ask what the role of tT_loss term is and which objective does it denote in the paper.

Hi, When it comes to training step, the problem arises with 'grad_norm' of NaN. I used diffuSeq-v2 and used FP16 for GPU acceleration. Where is the problem and how can...

Thanks for your great work. I have a question about loss calculation in training_losses_seq2seq() when the sampled time step `t=0` https://github.com/Shark-NLP/DiffuSeq/blob/bdc8f0adbff22e88c8530d1f20c3c7589c061d40/diffuseq/gaussian_diffusion.py#L612-L619 If `t=0`. The `x_t = self.q_sample()` line is incorrect,...

I'm trying to train with reference data, but I'm getting an error like this. Here are my script parameter settings and errors. Hope to get an answer, thanks。 python -m...

Thank @summmeer for sharing the code. Does DiffuSeq use DDPM model as the foundation?

Hi DiffuSeq authors! I followed the example from training to decode written in README but nothing was generated from decode. ![image](https://user-images.githubusercontent.com/51283097/227854019-0b12f0c4-c316-4299-9a7c-ba8d8c544694.png) May I know if there's some problem with my...

Hi @summmeer, I was wondering how I might go about implementing a machine translation task with DiffuSeq. I have trained DiffuSeq for the paraphrase task, but I want to be...

Hello! I trained the model on the WMT16 dataset and modified the parameters to the following values ![image](https://github.com/Shark-NLP/DiffuSeq/assets/116432930/50f1ba2a-7dce-493c-ab69-2f7cd9ab8b66) The main modifications were dim and seq_len, what's more, I change the...

Hi! I am trying to replicate the DiffuSeq model for the Paraphrase task with the QQP dataset. I kept everything to the default training config, and for MBR I ran...

If I want to modify it to single card training, which part of the code will need to be modified? How to modify?