DiffuSeq issues

Understanding tT_loss

Thank you @summmeer for sharing the code! I wanted to ask what the role of tT_loss term is and which objective does it denote in the paper.

orxh

'grad_norm' is NaN

3

Hi, When it comes to training step, the problem arises with 'grad_norm' of NaN. I used diffuSeq-v2 and used FP16 for GPU acceleration. Where is the problem and how can...

LikeStarting

About loss in training_losses_seq2seq() when time step t=0

5

Thanks for your great work. I have a question about loss calculation in training_losses_seq2seq() when the sampled time step `t=0` https://github.com/Shark-NLP/DiffuSeq/blob/bdc8f0adbff22e88c8530d1f20c3c7589c061d40/diffuseq/gaussian_diffusion.py#L612-L619 If `t=0`. The `x_t = self.q_sample()` line is incorrect,...

skpig

train

I'm trying to train with reference data, but I'm getting an error like this. Here are my script parameter settings and errors. Hope to get an answer, thanks。 python -m...

mattat0516

DDPM

3

Thank @summmeer for sharing the code. Does DiffuSeq use DDPM model as the foundation?

liurob2000

Nothing generated from decode

2

Hi DiffuSeq authors! I followed the example from training to decode written in README but nothing was generated from decode. ![image](https://user-images.githubusercontent.com/51283097/227854019-0b12f0c4-c316-4299-9a7c-ba8d8c544694.png) May I know if there's some problem with my...

johnnytam100

Machine Translation Task with DiffuSeq

6

Hi @summmeer, I was wondering how I might go about implementing a machine translation task with DiffuSeq. I have trained DiffuSeq for the paraphrase task, but I want to be...

chiral-carbon

If there is any rule to modify the parameters

1

Hello! I trained the model on the WMT16 dataset and modified the parameters to the following values ![image](https://github.com/Shark-NLP/DiffuSeq/assets/116432930/50f1ba2a-7dce-493c-ab69-2f7cd9ab8b66) The main modifications were dim and seq_len, what's more, I change the...

zkzhou126

Issues with decoding and evaluation

2

Hi! I am trying to replicate the DiffuSeq model for the Paraphrase task with the QQP dataset. I kept everything to the default training config, and for MBR I ran...

chiral-carbon

Only one gpu

2

If I want to modify it to single card training, which part of the code will need to be modified? How to modify?

wangfangfangcn

DiffuSeq
DiffuSeq copied to clipboard

Metadata

Understanding tT_loss

'grad_norm' is NaN

About loss in training_losses_seq2seq() when time step t=0

train

DDPM

Nothing generated from decode

Machine Translation Task with DiffuSeq

If there is any rule to modify the parameters

Issues with decoding and evaluation

Only one gpu

← Metadata

Owner

Metadata

DiffuSeq DiffuSeq copied to clipboard

Metadata

← Metadata

Owner

Metadata

DiffuSeq
DiffuSeq copied to clipboard