nan errors 5-10 min into the training.
VM (Win10, CPU Epyc 2.3ghz x48 cores,128GB RAM, RTX A100 80GB)
Tested on own workspace (data) aligned extracted at 512 res Test model is 416 res, liae-udt, slightly bumpped dims, tried with gradient clipping on/off and with on training last a little bit longer.
Training start and last 400-4000 iter then crash with Nan in src/dst loss line, preview window crash.
================================================================================
Starting. Press "Enter" to stop training and save model.
Traceback (most recent call last):an].6185]
File "G:\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\main.py", line 343, in
================================================================================= SO FAR: -Tested on lower spec model RTM 224 res model + stock Elon/Stark footage and same result, but takes longer to crash. -limited core number in system to 8. -tried different fork -unable to enable GPU scheduling as option does not exist for some reason and I know how to set it on different systems.
I appreciate any feedback
Hello,
Same issue .. Did you fix it ?
By switching to conda linux version.
On Wed, 17 Aug 2022, 12:05 jbarbaud, @.***> wrote:
Hello,
Same issue .. Did you fix it ?
— Reply to this email directly, view it on GitHub https://github.com/iperov/DeepFaceLab/issues/5525#issuecomment-1217863209, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN4TVHW6Z6AYH73IGHL6WLTVZTBPXANCNFSM5XGFMK3A . You are receiving this because you authored the thread.Message ID: @.***>
Issue solved / already answered (or it seems like user error), please close it.