wespeaker [Question] Is this training log normal or not?

Hi,

First of all, I want to thank for your contribution.

Today, I use your example to retrain voxceleb/Resnet34. Dataset is default vox1, vox2 (download via your default script utils) and my private 800 ids (I mixed my private data with vox2 data).

After training 1 epoch, the training log output is:

The Acc is only 0.015 - 0.04 for the first epoch. Is this normal or not.

(I want to ask this question because the training time is to long (one week), wait util finish tranining is not good idea)

Thank you in advance.

Jun 23 '25 08:06 hungnvk54

@hungnvk54 I think it is normal. The picture shows the training early, only 100 batches have passed, and no complete epoch has passed.

Jun 23 '25 09:06 cdliang11

I'm using shard datatype. So batch size here is calculated by number of audio file or by number of shard?

Jun 23 '25 09:06 hungnvk54

number of audio file

Jun 23 '25 09:06 cdliang11

@cdliang11 Thank for your response

If it is audio file, It mean we need 1.5days/epoch. So it is too long.

I'm traning on server with two GPU RTX 2080Ti. Is it normal?

And which server the pretrained model?

Jun 23 '25 09:06 hungnvk54

it's abnormal.
pretrained model: https://github.com/wenet-e2e/wespeaker/blob/master/docs/pretrained.md

Jun 23 '25 10:06 cdliang11

Thanks @cdliang11 We've fix the issue because the RAM is to small. Now we train on 02 RXT3090, It takes 3minutes/100 batch size. Is is normal:

Jun 24 '25 08:06 hungnvk54

Speed bottleneck should be related to cpu and IO, you can try to increase num_workers.

Jun 24 '25 16:06 cdliang11

Hi @cdliang11 , Can you share the configure for traning Resnet221. In the voxceleb/v2, There are no example configure for Resnet211. I found the Resnet211_LM in huggingface, but it seems that it is configure for LM, the postprocess of from the Resnet221?

Jun 25 '25 01:06 hungnvk54

I checked voxceleb_resnet221_LM.yaml on huggingface. But, I found that it should be voxceleb_resnet221.yaml. Sorry, it might be a typo. You can use it directly. https://huggingface.co/Wespeaker/wespeaker-voxceleb-resnet221-LM/blob/main/voxceleb_resnet221_LM.yaml

Jun 25 '25 04:06 cdliang11

@hungnvk54 I think it is normal. The picture shows the training early, only 100 batches have passed, and no complete epoch has passed.

Hi @cdliang11 I have a question about my custom model training. I'm in the first epoch and while the loss is decreasing, the accuracy is still 0. Is this normal behavior for the beginning of training, or could this indicate a problem? Thanks for your help!

Jul 29 '25 07:07 MM-WW55