Emirhan Kurtuluş

Results 26 comments of Emirhan Kurtuluş

In Pytorch, you can convert the pretrained weights into torch.half16 with the following way: `model.half()` but this is not the recommended way. Instead you have to use torch.cuda.amp.autocast while training....

You may consider loading the pretrained weights and wrapping each layer or block individually into a single nn.Module where you can play with the output of each intermediate layer/block.

Actually, this might cause more problems than the benefits of the possible ease of use. Consider the case where one uses torch.nn.DistributedDataParallel, there is no way to ensure that torch.device("cuda")...

> Hey @ekurtulus, such a low BLEU score looks indeed suspicious! Do you have any training stats / logs / graphs to share? My experiments are on an HPC system,...

> @patrickvonplaten @patil-suraj Do you know if `--dataset_name stas/wmt14-en-de-pre-processed` (which is pre-processed using a script from fairseq) is the good dataset for T5 (En -> German)? > > `T5` is...

> @ekurtulus I also think the checkpoints `t5-small`, `t5-base` etc. have been trained on WMT / CNN Dailymail datasets, as shown in the code snippet below. So using those checkpoints...

Here are the steps I am planning to take. 1. Find a good pretrained model which is not that large (I believe a mid-sized T5 would be a nice choice)...

The reason why I thought starting with T5 would be a good idea is that Flan-T5 outperforms OPT-IML. Also, once we have the training codebase available, we will be able...

I am following up on your commit with the following: - I am adding a script for creating pseudo-data for sanity checks and mock trainings - Adding PolyLoss support which...