DeepFaceLab nan errors 5-10 min into the training.

VM (Win10, CPU Epyc 2.3ghz x48 cores,128GB RAM, RTX A100 80GB)

Tested on own workspace (data) aligned extracted at 512 res Test model is 416 res, liae-udt, slightly bumpped dims, tried with gradient clipping on/off and with on training last a little bit longer.

Training start and last 400-4000 iter then crash with Nan in src/dst loss line, preview window crash.

================================================================================ Starting. Press "Enter" to stop training and save model. Traceback (most recent call last):an].6185] File "G:\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\main.py", line 343, in arguments.func(arguments) File "G:\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\main.py", line 132, in process_train Trainer.main(**kwargs) File "G:\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\mainscripts\Trainer.py", line 317, in main lh_img = models.ModelBase.get_loss_history_preview(loss_history_to_show, iter, w, c) File "G:\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\ModelBase.py", line 627, in get_loss_history_preview ph_max = int ( (plist_max[col][p] / plist_abs_max) * (lh_height-1) ) ValueError: cannot convert float NaN to integer [22:56:26][#001284][0175ms][nan][nan]

================================================================================= SO FAR: -Tested on lower spec model RTM 224 res model + stock Elon/Stark footage and same result, but takes longer to crash. -limited core number in system to 8. -tried different fork -unable to enable GPU scheduling as option does not exist for some reason and I know how to set it on different systems.

I appreciate any feedback

May 28 '22 06:05 zabique

Hello,

Same issue .. Did you fix it ?

Aug 17 '22 11:08 jeremybarbaud

By switching to conda linux version.

On Wed, 17 Aug 2022, 12:05 jbarbaud, @.***> wrote:

Hello,

Same issue .. Did you fix it ?

— Reply to this email directly, view it on GitHub https://github.com/iperov/DeepFaceLab/issues/5525#issuecomment-1217863209, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN4TVHW6Z6AYH73IGHL6WLTVZTBPXANCNFSM5XGFMK3A . You are receiving this because you authored the thread.Message ID: @.***>

Aug 17 '22 12:08 zabique

Issue solved / already answered (or it seems like user error), please close it.

Jun 08 '23 22:06 joolstorrentecalo