[Question] Is this training log normal or not?
Hi,
First of all, I want to thank for your contribution.
Today, I use your example to retrain voxceleb/Resnet34. Dataset is default vox1, vox2 (download via your default script utils) and my private 800 ids (I mixed my private data with vox2 data).
After training 1 epoch, the training log output is:
The Acc is only 0.015 - 0.04 for the first epoch. Is this normal or not.
(I want to ask this question because the training time is to long (one week), wait util finish tranining is not good idea)
Thank you in advance.
@hungnvk54 I think it is normal. The picture shows the training early, only 100 batches have passed, and no complete epoch has passed.
I'm using shard datatype. So batch size here is calculated by number of audio file or by number of shard?
number of audio file
@cdliang11 Thank for your response
If it is audio file, It mean we need 1.5days/epoch. So it is too long.
I'm traning on server with two GPU RTX 2080Ti. Is it normal?
And which server the pretrained model?
- it's abnormal.
- pretrained model: https://github.com/wenet-e2e/wespeaker/blob/master/docs/pretrained.md
Thanks @cdliang11 We've fix the issue because the RAM is to small. Now we train on 02 RXT3090, It takes 3minutes/100 batch size. Is is normal:
Speed bottleneck should be related to cpu and IO, you can try to increase num_workers.
Hi @cdliang11 , Can you share the configure for traning Resnet221. In the voxceleb/v2, There are no example configure for Resnet211. I found the Resnet211_LM in huggingface, but it seems that it is configure for LM, the postprocess of from the Resnet221?
I checked voxceleb_resnet221_LM.yaml on huggingface. But, I found that it should be voxceleb_resnet221.yaml. Sorry, it might be a typo. You can use it directly. https://huggingface.co/Wespeaker/wespeaker-voxceleb-resnet221-LM/blob/main/voxceleb_resnet221_LM.yaml
@hungnvk54 I think it is normal. The picture shows the training early, only 100 batches have passed, and no complete epoch has passed.
Hi @cdliang11 I have a question about my custom model training. I'm in the first epoch and while the loss is decreasing, the accuracy is still 0. Is this normal behavior for the beginning of training, or could this indicate a problem? Thanks for your help!