Catch model saving error
If an error is thrown when trying to save a checkpoint, it will give the user the error and ask if they would like to save again. This helps with cases where something goes wrong at the end of an hours-long training (like if you didn't have enough disk space) The code was also rearranged so that the epochs and steps parameters do not get altered and new_ckpt is only created if it is used
Thank you for this! This is a great idea.
However, it appears that some of the users are training the model in a non-interactive environment. In those cases, the script will stop (although it will crash anyway). In addition, supporting SD1.5/2.0 and LoRA will be nice.
I think we need to consider how to implement this appropriately.
I updated all of the model-saving logic to funnel into a single function that has the try/catch and consolidated the logic for them as much as possible while keeping all the functionality the same. There is probably some potential to consolidate more if we change some of the differing logic to match.
Includes a merge of this PR: https://github.com/kohya-ss/sd-scripts/pull/1371