Retrieval-based-Voice-Conversion-WebUI icon indicating copy to clipboard operation
Retrieval-based-Voice-Conversion-WebUI copied to clipboard

[Feature Request] Save the last n checkpoint

Open nikita488 opened this issue 2 years ago • 3 comments

While saving only the latest '.ckpt' file to save disk space is a great idea, there are several situations where something unexpected happens like some error in Colab that is corrupting latest D/G_2333333.pth file (not saving it fully), or saving only D while not having time to save G checkpoint (or vise versa) because it was interrupted.

I trained a few models using Colab and a lot of them had corrupted D or G checkpoint because some error happened while it was saving, so i lost hours of trained data and need to start from scratch which is very annoying.

My idea is to have some option inside UI like "Save last n checkpoints". As an example, if you set this option to 10 and the 'Save frequency (save_every_epoch)' set to 10, it will save 10, 20, 30 ... 100 epoch D and G checkpoints, then it will delete oldest checkpoints (D_10.pth and G_10.pth in this example), and save latest one (save 110, then delete 20 epoch checkpoints and save 120 and continue). That way we will have some backups and can continue to train model from a older checkpoint if something bad happened with latest checkpoint.

nikita488 avatar Jul 04 '23 20:07 nikita488

ok, in the next version I will modify it to save the latestest 2 ckpt.

RVC-Boss avatar Jul 05 '23 03:07 RVC-Boss

Great!

nikita488 avatar Jul 05 '23 07:07 nikita488

Renaming existing checkpoints to G/D_*.pth.bak and saving new ones probably will do

nikita488 avatar Jul 05 '23 09:07 nikita488