sd-scripts icon indicating copy to clipboard operation
sd-scripts copied to clipboard

Catch model saving error

Open Cauldrath opened this issue 2 years ago • 2 comments

If an error is thrown when trying to save a checkpoint, it will give the user the error and ask if they would like to save again. This helps with cases where something goes wrong at the end of an hours-long training (like if you didn't have enough disk space) The code was also rearranged so that the epochs and steps parameters do not get altered and new_ckpt is only created if it is used

Cauldrath avatar Apr 19 '24 13:04 Cauldrath

Thank you for this! This is a great idea.

However, it appears that some of the users are training the model in a non-interactive environment. In those cases, the script will stop (although it will crash anyway). In addition, supporting SD1.5/2.0 and LoRA will be nice.

I think we need to consider how to implement this appropriately.

kohya-ss avatar May 12 '24 12:05 kohya-ss

I updated all of the model-saving logic to funnel into a single function that has the try/catch and consolidated the logic for them as much as possible while keeping all the functionality the same. There is probably some potential to consolidate more if we change some of the differing logic to match.

Includes a merge of this PR: https://github.com/kohya-ss/sd-scripts/pull/1371

Cauldrath avatar Jun 12 '24 19:06 Cauldrath