Results 26 comments of Jxu-Thu

If I use smaller nodes such as num_gpus=8 num_nodes=1, (batch size 4096, with accum_steps=8) should I modify the other configurations? such as the max_steps?

Many thanks for your kind reply! I am trying to reproduce the results with 24 V100 GPUs with accu steps 3 and batch size over 4k without modifying any configurations.

I found a very slow training speed due to numerous training iterations in each epoch. I try to inspect why so many iterations using a small batchsize. Given the vg+mscoco+gcc+sbu...

vg+mscoco+gcc+sbu ---- INFO - ViLT - Running command 'print_config' INFO - ViLT - Started Configuration (modified, added, typechanged, doc): batch_size = 4096 # this is a desired batch size; pl...

Thanks! I make a mistake in the data processing. Once fixing the mistake, I have similar iterations with yours.

I have the same question about it

I can run the training scripts on fairseq 0.10.2. But the fairseq 0.10.2 cannot support the inference with kenlm and flashlight [I also tried wav2letter]. I tried the master branch...

> 0.10.2 supporst KenLM and as well as other LMs It seems that it cannot support TransformerLM? And 0.10.2 version can only inference on cuda version