Emirhan Kurtuluş comments

Results 26 comments of


                                            Emirhan Kurtuluş

half / mixed precision?

In Pytorch, you can convert the pretrained weights into torch.half16 with the following way: `model.half()` but this is not the recommended way. Instead you have to use torch.cuda.amp.autocast while training....

Add input at intermediate layer

You may consider loading the pretrained weights and wrapping each layer or block individually into a single nn.Module where you can play with the output of each intermediate layer/block.

Consider if user has GPU/CPU while calling `torch.load()`

Actually, this might cause more problems than the benefits of the possible ease of use. Consider the case where one uses torch.nn.DistributedDataParallel, there is no way to ensure that torch.device("cuda")...

Cannot replicate T5 performance on WMT14

> Hey @ekurtulus, such a low BLEU score looks indeed suspicious! Do you have any training stats / logs / graphs to share? My experiments are on an HPC system,...

Cannot replicate T5 performance on WMT14

> @patrickvonplaten @patil-suraj Do you know if `--dataset_name stas/wmt14-en-de-pre-processed` (which is pre-processed using a script from fairseq) is the good dataset for T5 (En -> German)? > > `T5` is...

Cannot replicate T5 performance on WMT14

> @ekurtulus I also think the checkpoints `t5-small`, `t5-base` etc. have been trained on WMT / CNN Dailymail datasets, as shown in the code snippet below. So using those checkpoints...

Try Supervised Fine-Tuning on pseudo-QA-data

I can help with this

Try Supervised Fine-Tuning on pseudo-QA-data

Here are the steps I am planning to take. 1. Find a good pretrained model which is not that large (I believe a mid-sized T5 would be a nice choice)...

Try Supervised Fine-Tuning on pseudo-QA-data

The reason why I thought starting with T5 would be a good idea is that Flan-T5 outperforms OPT-IML. Also, once we have the training codebase available, we will be able...

Try Supervised Fine-Tuning on pseudo-QA-data

I am following up on your commit with the following: - I am adding a script for creating pseudo-data for sanity checks and mock trainings - Adding PolyLoss support which...