Jxu-Thu comments

Results 26 comments of


                                            Jxu-Thu

A very large batchsize requires 64 GPUs

If I use smaller nodes such as num_gpus=8 num_nodes=1, (batch size 4096, with accum_steps=8) should I modify the other configurations? such as the max_steps?

A very large batchsize requires 64 GPUs

Many thanks for your kind reply! I am trying to reproduce the results with 24 V100 GPUs with accu steps 3 and batch size over 4k without modifying any configurations.

A very large batchsize requires 64 GPUs

Thanks for your reminder

A very large batchsize requires 64 GPUs

I found a very slow training speed due to numerous training iterations in each epoch. I try to inspect why so many iterations using a small batchsize. Given the vg+mscoco+gcc+sbu...

A very large batchsize requires 64 GPUs

vg+mscoco+gcc+sbu ---- INFO - ViLT - Running command 'print_config' INFO - ViLT - Started Configuration (modified, added, typechanged, doc): batch_size = 4096 # this is a desired batch size; pl...

A very large batchsize requires 64 GPUs

Thanks! I make a mistake in the data processing. Once fixing the mistake, I have similar iterations with yours.

Error using the shape of spectrogram

I have the same question about it

Wav2vec2 error after validation when training : terminate called after throwing an instance of 'c10::Error'

I can run the training scripts on fairseq 0.10.2. But the fairseq 0.10.2 cannot support the inference with kenlm and flashlight [I also tried wav2letter]. I tried the master branch...

Wav2vec2 error after validation when training : terminate called after throwing an instance of 'c10::Error'

> 0.10.2 supporst KenLM and as well as other LMs It seems that it cannot support TransformerLM? And 0.10.2 version can only inference on cuda version

Wav2vec2 error after validation when training : terminate called after throwing an instance of 'c10::Error'

I tried cuda 10.2 and 11.0。torch1.5.1 and1.7.1。get same error