fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

Wav2Vec 2 pretraining bug

Open jubick1337 opened this issue 4 years ago • 4 comments

🐛 Bug

Loss goes to very low values and accuracy is 1 after several updates. I'm sure it's bug and this metrics are wrong.

To Reproduce

  1. Get any considerable amount of wavs (2k hours in my case)
  2. Split data using manifest (--valid-percent set to 0.05)
  3. Start pretraining with default wav2vec2_large_librivox config

Logs: [2021-07-06 06:58:09,462][train_inner][INFO] - {"epoch": 1, "update": 0.002, "loss": "6.503", "ntokens": "1237.21", "nsentences": "12.44", "prob_perplexity": "107.961", "code_perplexity": "105.203", "temp": "1.999", "loss_0": "6.383", "loss_1": "0.12", "accuracy": "0.07339", "wps": "4161.8", "ups": "3.36", "wpb": "1237.2", "bsz": "12.4", "num_updates": "200", "lr": "3.125e-05", "gnorm": "3.672", "loss_scale": "64", "train_wall": "62", "gb_free": "7.6", "wall": "72"} [2021-07-06 06:59:09,734][train_inner][INFO] - {"epoch": 1, "update": 0.003, "loss": "5.939", "ntokens": "1199.67", "nsentences": "12.82", "prob_perplexity": "39.035", "code_perplexity": "38.185", "temp": "1.997", "loss_0": "5.804", "loss_1": "0.135", "accuracy": "0.2277", "wps": "3980.9", "ups": "3.32", "wpb": "1199.7", "bsz": "12.8", "num_updates": "400", "lr": "6.25e-05", "gnorm": "4.445", "loss_scale": "64", "train_wall": "59", "gb_free": "12.5", "wall": "132"} [2021-07-06 06:59:27,229][fairseq.trainer][INFO] - NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 [2021-07-06 07:00:11,138][train_inner][INFO] - {"epoch": 1, "update": 0.005, "loss": "3.243", "ntokens": "1225.73", "nsentences": "12.245", "prob_perplexity": "3.89", "code_perplexity": "3.88", "temp": "1.995", "loss_0": "3.1", "loss_1": "0.143", "accuracy": "0.74319", "wps": "3992.4", "ups": "3.26", "wpb": "1225.7", "bsz": "12.2", "num_updates": "600", "lr": "9.375e-05", "gnorm": "5.04", "loss_scale": "32", "train_wall": "60", "gb_free": "12.9", "wall": "193"} [2021-07-06 07:01:11,700][train_inner][INFO] - {"epoch": 1, "update": 0.006, "loss": "0.683", "ntokens": "1235.98", "nsentences": "12.32", "prob_perplexity": "2.294", "code_perplexity": "2.295", "temp": "1.993", "loss_0": "0.539", "loss_1": "0.144", "accuracy": "0.95837", "wps": "4081.8", "ups": "3.3", "wpb": "1236", "bsz": "12.3", "num_updates": "800", "lr": "0.000125", "gnorm": "1.244", "loss_scale": "32", "train_wall": "59", "gb_free": "11.1", "wall": "254"} b[2021-07-06 07:02:11,323][train_inner][INFO] - {"epoch": 1, "update": 0.008, "loss": "0.144", "ntokens": "1205.85", "nsentences": "12.43", "prob_perplexity": "2", "code_perplexity": "2", "temp": "1.991", "loss_0": "0", "loss_1": "0.144", "accuracy": "1", "wps": "4045", "ups": "3.35", "wpb": "1205.8", "bsz": "12.4", "num_updates": "1000", "lr": "0.00015625", "gnorm": "0", "loss_scale": "32", "train_wall": "59", "gb_free": "11.8", "wall": "313"} [2021-07-06 07:03:10,747][train_inner][INFO] - {"epoch": 1, "update": 0.009, "loss": "0.144", "ntokens": "1205.47", "nsentences": "12.555", "prob_perplexity": "2", "code_perplexity": "2", "temp": "1.989", "loss_0": "0", "loss_1": "0.144", "accuracy": "1", "wps": "4057.2", "ups": "3.37", "wpb": "1205.5", "bsz": "12.6", "num_updates": "1200", "lr": "0.0001875", "gnorm": "0", "loss_scale": "32", "train_wall": "58", "gb_free": "11.3", "wall": "373"} [2021-07-06 07:04:09,589][train_inner][INFO] - {"epoch": 1, "update": 0.011, "loss": "0.144", "ntokens": "1174.7", "nsentences": "11.985", "prob_perplexity": "2", "code_perplexity": "2", "temp": "1.987", "loss_0": "0", "loss_1": "0.144", "accuracy": "1", "wps": "3992.8", "ups": "3.4", "wpb": "1174.7", "bsz": "12", "num_updates": "1400", "lr": "0.00021875", "gnorm": "0", "loss_scale": "32", "train_wall": "58", "gb_free": "13.3", "wall": "432"} [2021-07-06 07:05:10,021][train_inner][INFO] - {"epoch": 1, "update": 0.012, "loss": "0.144", "ntokens": "1230.74", "nsentences": "12.43", "prob_perplexity": "2", "code_perplexity": "2", "temp": "1.985", "loss_0": "0", "loss_1": "0.144", "accuracy": "1", "wps": "4073.2", "ups": "3.31", "wpb": "1230.7", "bsz": "12.4", "num_updates": "1600", "lr": "0.00025", "gnorm": "0", "loss_scale": "32", "train_wall": "59", "gb_free": "12.3", "wall": "492"}

Expected behavior

More smooth descend?

Environment

  • fairseq Version (e.g., 1.0 or master): 0794f9a
  • PyTorch Version (e.g., 1.0) 1.9.0a0+df837d0
  • OS (e.g., Linux): Ubuntu 20.04 LTS
  • How you installed fairseq (pip, source): source
  • Build command you used (if compiling from source): python set up.py build_ext --inplace
  • Python version: 3.8.8
  • CUDA/cuDNN version: 11.2
  • GPU models and configuration: RTX 3090
  • Any other relevant information:

Additional context

Changing LR solves this issue but what if one wants to use exact parameters from paper?

jubick1337 avatar Jul 06 '21 07:07 jubick1337

Additional context

Changing LR solves this issue but what if one wants to use exact parameters from paper?

You need to most likely use the exact number of GPUs used in the paper, if you want to use the exact parameters from the paper

medabalimi avatar Jul 06 '21 14:07 medabalimi

@medabalimi Can I somehow adjust parameters to my hardware setup?

jubick1337 avatar Jul 08 '21 19:07 jubick1337

@jubick1337 Same issue here. If you find any solutions, could you let me know?

Megumu2597 avatar Aug 06 '21 01:08 Megumu2597

We also encountered the same preblem as you, and the evaluation index accuracy was always close to 1.0.

Looking forward to kindly reply.

zw76859420 avatar Jun 27 '24 05:06 zw76859420