Jxu-Thu
Jxu-Thu
If I use smaller nodes such as num_gpus=8 num_nodes=1, (batch size 4096, with accum_steps=8) should I modify the other configurations? such as the max_steps?
Many thanks for your kind reply! I am trying to reproduce the results with 24 V100 GPUs with accu steps 3 and batch size over 4k without modifying any configurations.
Thanks for your reminder
I found a very slow training speed due to numerous training iterations in each epoch. I try to inspect why so many iterations using a small batchsize. Given the vg+mscoco+gcc+sbu...
vg+mscoco+gcc+sbu ---- INFO - ViLT - Running command 'print_config' INFO - ViLT - Started Configuration (modified, added, typechanged, doc): batch_size = 4096 # this is a desired batch size; pl...
Thanks! I make a mistake in the data processing. Once fixing the mistake, I have similar iterations with yours.
I have the same question about it
I can run the training scripts on fairseq 0.10.2. But the fairseq 0.10.2 cannot support the inference with kenlm and flashlight [I also tried wav2letter]. I tried the master branch...
> 0.10.2 supporst KenLM and as well as other LMs It seems that it cannot support TransformerLM? And 0.10.2 version can only inference on cuda version
I tried cuda 10.2 and 11.0。torch1.5.1 and1.7.1。get same error