xmtf What is the training config?

Hello, thanks for your work! I want to try to implement this work myself, but I cann't achieve the high performance by xP3 and mT0-xxl as shown in the paper Crosslingual Generalization through Multitask Finetuning. I wonder the training details of this work, how many steps do you train the model, and what is your lr-decay-ratio? Could I get the config file to implement your result? Thank you very much!

Dec 27 '22 08:12 mkw18

I've reached out to @adarob, who knows those details & has the config files - Will let you know if we can release them!

Dec 27 '22 19:12 Muennighoff

It's just the default T5X finetune configuration (https://github.com/google-research/t5x/blob/main/t5x/configs/runs/finetune.gin) with the following overrides:

BATCH_SIZE = 512
LOSS_NORMALIZING_FACTOR = 'AVERAGE_PER_SEQUENCE'
TASK_FEATURE_LENGTHS = {'inputs': 1024, 'targets': 1024}
train/utils.DatasetConfig.pack = False

Of course, you'd also need to set up the mixture if you're using SeqIO.

We only trained for 30k steps and picked the best checkpoint, which I believe was around 7k.

Dec 27 '22 20:12 adarob

Thanks a lot! I will try this.

Dec 28 '22 07:12 mkw18