Sys.exit hangs the process on the latest 8.2.0-Dev branch
Haxe 4.3.4 lime 8.2.0-Dev from https://github.com/openfl/lime/actions/runs/9472284613 HL 1.14. Windows 10
this repros on HL, but I have seen in cpp as well in another project. I expect this will do the same.
I just used the HelloWorld sample and added a key up test for the ESCAPE key:
public override function onKeyUp(key:KeyCode, modifier:KeyModifier):Void {
switch (key) {
case ESCAPE:
Sys.exit(0);
default:
};
}
build and run the program. Monitor it from MS ProcessExplorer Pop up the process specific window and watch the Threads tab. Once it renders hit the ESCAPE key. It will kill off all but the main program thread. If you check the stack from this thread you'll see something very like:
ntdll.dll!NtWaitForAlertByThreadId+0x14
ntdll.dll!RtlSleepConditionVariableSRW+0x131
KERNELBASE.dll!SleepConditionVariableSRW+0x29
lime.hdll!hid_write_output_report+0x9c175
lime.hdll!alIsEffect+0x521e
lime.hdll!alIsEffect+0x94f6
lime.hdll!alcDestroyContext+0x118
lime.hdll!lime_hl_al_delete_source+0xa2
lime.hdll!hid_write_output_report+0xe85d1
lime.hdll!hid_write_output_report+0xe8375
lime.hdll!hid_write_output_report+0xe86a9
lime.hdll!hid_write_output_report+0xca825
lime.hdll!hid_write_output_report+0xca799
lime.hdll!hid_write_output_report+0xca923
lime.hdll!hid_write_output_report+0xbc58f
lime.hdll!hid_write_output_report+0xbce1d
lime.hdll!hid_write_output_report+0xbcf39
ntdll.dll!RtlActivateActivationContextUnsafeFast+0x11d
ntdll.dll!LdrShutdownProcess+0x22a
ntdll.dll!RtlExitUserProcess+0xad
KERNEL32.DLL!FatalExit+0xb
ucrtbase.dll!exit+0x1dc
ucrtbase.dll!exit+0x7f
libhl.dll!hl_sys_exit+0x1e
Forgot to mention that using lime.system.System.exit() instead of Sys.exit() does work.
This doesn't seem to happen on Linux, so I assume it's Windows-specific.
This bug was originally present in the develop branch actually, only after it got merged into 8.2.0-Dev in 95baa58effaff3f6158d642873dd329a8f6e048a this issue also happened in the branch, which confuses me since the changes shown don't actually cause the issue.
That means the 8.2.0 release also has this bug.
I confirm that this reproduces with Lime 8.2, but not Lime 8.1.3. Windows only. I'm actually reproducing in Neko. I also confirm that replacing Sys.exit() with lime.system.System.exit() works. Interestingly, lime.system.System.exit() calls Sys.exit() internally after some other cleanup. It seems to be that other cleanup that is suddenly required in Lime 8.2, but was not previously.
after some other cleanup
Looked it up, and this seems to be the most substantial part of it:
https://github.com/openfl/lime/blob/88c2f6db98a8420a48bcef894cd0cf30fb7f70c4/src/lime/_internal/backend/native/NativeApplication.hx#L160-L167
I don't recall any major changes to Lime's C++ code, so it's probably not lime_application_quit().
AudioManager hasn't changed in 5 years, but it calls ALC functions, which changed when we updated the submodule. Maybe it's something there.
Yeah, that's what I was just narrowing it down to. If I comment out AudioManager.shutdown(), it freezes with lime.system.System.exit() too.
Not sure if this helps or not. I dove deeper into AudioManager.shutdown(). I need to comment out both of these when using lime.system.System.exit() before it hangs instead of exiting. If I comment out only one or the other, it exits successfully.
alc.destroyContext(currentContext);
if (device != null)
{
alc.closeDevice(device);
}
This looks suspiciously similar, but unfortunately, they just switched to a different library to solve it and there isn't much in the way of clues, other than the fact that they don't seem to clean up anything:
https://forum.qt.io/topic/137034/openal-soft-does-not-allow-to-stop-qt5-application/3
I mentioned this in the OpenFL discord but I'll also mention it here, the issue appears to be caused due to lime_al_atexit being called when the process exits.
https://github.com/openfl/lime/blob/e24ab07125d06b99474663b2bc373308903f9ee3/project/src/media/openal/OpenALBindings.cpp#L3415
To quote the OpenAL Soft maintainer from a similar issue also caused by destroying the context at exit: https://github.com/kcat/openal-soft/issues/378#issuecomment-1485963899
OpenAL Soft has no idea it's being called during process exit/dll unload, when the system's loader lock is held. It's not safe to destroy contexts or close devices there because it will try to free system resources that can't be freed at that time, causing a deadlock as it waits on a lock that won't get released.
It was never a problem before, so there must be a way to clean things up without deadlocking. I'm trying to see if we can provide our own cleanup function. I'm pretty sure we dont necessarily have to worry about thread safety since the process is already shutting down, so it may be possible to make sure the device at least gets closed. Looking at the cleanup functions in OpenAL it looks like there is some redundancy anyway since alcCloseDevice cleans up any contexts associated with the device.
A couple of thoughts.
It has always struck me as a bit odd that flixel does not appear to have an orderly exit() call that in turn calls openfl which then calls lime and so on so that all resources are cleaned up in an orderly manner and library invariants are respected. I have asked before in the flixel discord, but the only answer for exit seemed to be Sys.exit(). I'll checked again. But in general, it makes sense to me that lime.system.System.exit() should be called explicitly somewhere in an orderly shutdown driven from openfl shutdown. I assume, but don't know, that this bug is a problem of cleanup ordering and that various hooks into Sys.exit() result in clean up code being called in an unexpected order, ordering being the root of most deadlocks in the end. I don't know if there is a way to see if some OpenAL cleanup hook is firing before the lime exit code runs but it might prove the above theory if it could. Regardless, orderly top-down driven exit seems preferrable anyway.
I realize this would be a breaking change. The only other thought I have is if there is any way to control the order of the Sys.exit() hook executions, given the above theory about clean up is correct.
A couple of thoughts.
It has always struck me as a bit odd that flixel does not appear to have an orderly exit() call that in turn calls openfl which then calls lime and so on so that all resources are cleaned up in an orderly manner and library invariants are respected. I have asked before in the flixel discord, but the only answer for exit seemed to be Sys.exit(). I'll checked again. But in general, it makes sense to me that lime.system.System.exit() should be called explicitly somewhere in an orderly shutdown driven from openfl shutdown. I assume, but don't know, that this bug is a problem of cleanup ordering and that various hooks into Sys.exit() result in clean up code being called in an unexpected order, ordering being the root of most deadlocks in the end. I don't know if there is a way to see if some OpenAL cleanup hook is firing before the lime exit code runs but it might prove the above theory if it could. Regardless, orderly top-down driven exit seems preferrable anyway.
I realize this would be a breaking change. The only other thought I have is if there is any way to control the order of the Sys.exit() hook executions, given the above theory about clean up is correct.
There actually isn't much cleanup tied into Sys.exit(); From here on out it will probably be recommended to use lime.system.System.exit(). In lime's exit call, we clean up some things and then just call Sys.exit(). In c++, we hook into the process exit and try to clean up the openal context and then close the device in case Sys.exit() is called, but due to some changes in the way openal works, this no longer appears to be an option.
On the bright side, the OS does seem to clean things up on its own in short time. If you use lime.system.System.exit() it will clean up appropriately.
Yeah based on the discussion above that seems the only option. As flixel is an openfl app (kinda mostly) it should probably call openfl.System.exit() I guess. Though that functions notes that AIR apps should do something different so it's not as straightforward to do a generic job that will just work. Just the purist in me wanting a nice clean solution.