What is the training config?
Hello, thanks for your work! I want to try to implement this work myself, but I cann't achieve the high performance by xP3 and mT0-xxl as shown in the paper Crosslingual Generalization through Multitask Finetuning. I wonder the training details of this work, how many steps do you train the model, and what is your lr-decay-ratio? Could I get the config file to implement your result? Thank you very much!
I've reached out to @adarob, who knows those details & has the config files - Will let you know if we can release them!
It's just the default T5X finetune configuration (https://github.com/google-research/t5x/blob/main/t5x/configs/runs/finetune.gin) with the following overrides:
BATCH_SIZE = 512
LOSS_NORMALIZING_FACTOR = 'AVERAGE_PER_SEQUENCE'
TASK_FEATURE_LENGTHS = {'inputs': 1024, 'targets': 1024}
train/utils.DatasetConfig.pack = False
Of course, you'd also need to set up the mixture if you're using SeqIO.
We only trained for 30k steps and picked the best checkpoint, which I believe was around 7k.
Thanks a lot! I will try this.