Wav2Vec 2 pretraining bug
🐛 Bug
Loss goes to very low values and accuracy is 1 after several updates. I'm sure it's bug and this metrics are wrong.
To Reproduce
- Get any considerable amount of wavs (2k hours in my case)
- Split data using manifest (--valid-percent set to 0.05)
- Start pretraining with default wav2vec2_large_librivox config
Logs: [2021-07-06 06:58:09,462][train_inner][INFO] - {"epoch": 1, "update": 0.002, "loss": "6.503", "ntokens": "1237.21", "nsentences": "12.44", "prob_perplexity": "107.961", "code_perplexity": "105.203", "temp": "1.999", "loss_0": "6.383", "loss_1": "0.12", "accuracy": "0.07339", "wps": "4161.8", "ups": "3.36", "wpb": "1237.2", "bsz": "12.4", "num_updates": "200", "lr": "3.125e-05", "gnorm": "3.672", "loss_scale": "64", "train_wall": "62", "gb_free": "7.6", "wall": "72"} [2021-07-06 06:59:09,734][train_inner][INFO] - {"epoch": 1, "update": 0.003, "loss": "5.939", "ntokens": "1199.67", "nsentences": "12.82", "prob_perplexity": "39.035", "code_perplexity": "38.185", "temp": "1.997", "loss_0": "5.804", "loss_1": "0.135", "accuracy": "0.2277", "wps": "3980.9", "ups": "3.32", "wpb": "1199.7", "bsz": "12.8", "num_updates": "400", "lr": "6.25e-05", "gnorm": "4.445", "loss_scale": "64", "train_wall": "59", "gb_free": "12.5", "wall": "132"} [2021-07-06 06:59:27,229][fairseq.trainer][INFO] - NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 [2021-07-06 07:00:11,138][train_inner][INFO] - {"epoch": 1, "update": 0.005, "loss": "3.243", "ntokens": "1225.73", "nsentences": "12.245", "prob_perplexity": "3.89", "code_perplexity": "3.88", "temp": "1.995", "loss_0": "3.1", "loss_1": "0.143", "accuracy": "0.74319", "wps": "3992.4", "ups": "3.26", "wpb": "1225.7", "bsz": "12.2", "num_updates": "600", "lr": "9.375e-05", "gnorm": "5.04", "loss_scale": "32", "train_wall": "60", "gb_free": "12.9", "wall": "193"} [2021-07-06 07:01:11,700][train_inner][INFO] - {"epoch": 1, "update": 0.006, "loss": "0.683", "ntokens": "1235.98", "nsentences": "12.32", "prob_perplexity": "2.294", "code_perplexity": "2.295", "temp": "1.993", "loss_0": "0.539", "loss_1": "0.144", "accuracy": "0.95837", "wps": "4081.8", "ups": "3.3", "wpb": "1236", "bsz": "12.3", "num_updates": "800", "lr": "0.000125", "gnorm": "1.244", "loss_scale": "32", "train_wall": "59", "gb_free": "11.1", "wall": "254"} b[2021-07-06 07:02:11,323][train_inner][INFO] - {"epoch": 1, "update": 0.008, "loss": "0.144", "ntokens": "1205.85", "nsentences": "12.43", "prob_perplexity": "2", "code_perplexity": "2", "temp": "1.991", "loss_0": "0", "loss_1": "0.144", "accuracy": "1", "wps": "4045", "ups": "3.35", "wpb": "1205.8", "bsz": "12.4", "num_updates": "1000", "lr": "0.00015625", "gnorm": "0", "loss_scale": "32", "train_wall": "59", "gb_free": "11.8", "wall": "313"} [2021-07-06 07:03:10,747][train_inner][INFO] - {"epoch": 1, "update": 0.009, "loss": "0.144", "ntokens": "1205.47", "nsentences": "12.555", "prob_perplexity": "2", "code_perplexity": "2", "temp": "1.989", "loss_0": "0", "loss_1": "0.144", "accuracy": "1", "wps": "4057.2", "ups": "3.37", "wpb": "1205.5", "bsz": "12.6", "num_updates": "1200", "lr": "0.0001875", "gnorm": "0", "loss_scale": "32", "train_wall": "58", "gb_free": "11.3", "wall": "373"} [2021-07-06 07:04:09,589][train_inner][INFO] - {"epoch": 1, "update": 0.011, "loss": "0.144", "ntokens": "1174.7", "nsentences": "11.985", "prob_perplexity": "2", "code_perplexity": "2", "temp": "1.987", "loss_0": "0", "loss_1": "0.144", "accuracy": "1", "wps": "3992.8", "ups": "3.4", "wpb": "1174.7", "bsz": "12", "num_updates": "1400", "lr": "0.00021875", "gnorm": "0", "loss_scale": "32", "train_wall": "58", "gb_free": "13.3", "wall": "432"} [2021-07-06 07:05:10,021][train_inner][INFO] - {"epoch": 1, "update": 0.012, "loss": "0.144", "ntokens": "1230.74", "nsentences": "12.43", "prob_perplexity": "2", "code_perplexity": "2", "temp": "1.985", "loss_0": "0", "loss_1": "0.144", "accuracy": "1", "wps": "4073.2", "ups": "3.31", "wpb": "1230.7", "bsz": "12.4", "num_updates": "1600", "lr": "0.00025", "gnorm": "0", "loss_scale": "32", "train_wall": "59", "gb_free": "12.3", "wall": "492"}
Expected behavior
More smooth descend?
Environment
- fairseq Version (e.g., 1.0 or master): 0794f9a
- PyTorch Version (e.g., 1.0) 1.9.0a0+df837d0
- OS (e.g., Linux): Ubuntu 20.04 LTS
- How you installed fairseq (
pip, source): source - Build command you used (if compiling from source): python set up.py build_ext --inplace
- Python version: 3.8.8
- CUDA/cuDNN version: 11.2
- GPU models and configuration: RTX 3090
- Any other relevant information:
Additional context
Changing LR solves this issue but what if one wants to use exact parameters from paper?
Additional context
Changing LR solves this issue but what if one wants to use exact parameters from paper?
You need to most likely use the exact number of GPUs used in the paper, if you want to use the exact parameters from the paper
@medabalimi Can I somehow adjust parameters to my hardware setup?
@jubick1337 Same issue here. If you find any solutions, could you let me know?
We also encountered the same preblem as you, and the evaluation index accuracy was always close to 1.0.
Looking forward to kindly reply.