DiffuSeq icon indicating copy to clipboard operation
DiffuSeq copied to clipboard

Machine Translation Task with DiffuSeq

Open chiral-carbon opened this issue 1 year ago • 6 comments

Hi @summmeer,

I was wondering how I might go about implementing a machine translation task with DiffuSeq. I have trained DiffuSeq for the paraphrase task, but I want to be able to use it for translation tasks. Would supplying a translation dataset to the existing codebase (since it designed for seq2seq tasks) suffice or would further changes be required?

Would appreciate any advice, thanks!

chiral-carbon avatar Feb 20 '24 06:02 chiral-carbon

Hi, You can have a try. But different hyper-parameters may lead to different results, including bsz, steps, dim, seq_len, and tokenizers. Currently many follow-up works achieve better MT performance and you can refer to their codebase, too.

summmeer avatar Feb 23 '24 06:02 summmeer

Yeah makes sense, thanks! Are you referring to works like SeqDiffuSeq which builds on DiffuSeq directly?

chiral-carbon avatar Feb 25 '24 21:02 chiral-carbon

It depends on what your goal is using diffusion model for MT tasks. Follow-up works are not exactly the same with DiffuSeq. SeqDiffuSeq is based on encoder-decoder architecture, while RDM is based on discrete text diffusion. This work also involves pre-trained MLMs. If you're aiming the performance, you could refer to the SOTA model.

summmeer avatar Feb 26 '24 03:02 summmeer

@summmeer thanks, this is very helpful! in the paper DiNoiSer, the authors claim to have surpassed DiffuSeq's performance on the WMT14 EN->DE task, so I wanted to do a similar comparison between DiffuSeq and DiNoiSer on the IWSLT14 task, but DiffuSeq takes a long time to train. Even with the QQP task reported in the paper, I tried training it to replicate the results and on 4 A100 GPUs it took 6.5 days to train (WandB overview), so do you think there is additional distributed training code required to train DiffuSeq more efficiently?

Sorry for the trivial question, your replies are really helpful, thanks!

chiral-carbon avatar Feb 29 '24 04:02 chiral-carbon

Hi, Maybe you can try our updated version 2, which is 4x faster on training and 800x faster on sampling on QQP datasets. [We update the information of v2 in README.md]

summmeer avatar Feb 29 '24 05:02 summmeer

I will, thanks a lot!

chiral-carbon avatar Feb 29 '24 16:02 chiral-carbon