Random crash in wayfire_foreign_toplevel::init_request_handlers() (race condition with xdg-popups?)
Describe the bug This happened once randomly while I was experimenting in follow up to #2422 -- Wayfire crashed while clicking on an icon with a subdock in Cairo-Dock. My hunch is that this is a race condition with xdg-popups similar to #772 but in this case Wayfire trying to use an "inert" popup (more detailed reasoning below).
To Reproduce So far I cannot actually reproduce, which is not surprising, since it is likely extremely sensitive to timing, and I've already been using the equivalent setup for ~2-3 years without issues. I do have an idea how to write a reproducible example based on my previous experience with similar race conditions e.g. this and this. Once I have a bit more time, I can experiment with this.
What I did the time it happened:
- Start Cairo-Dock (beta version) and also this plugin (note: based on my previous experience, any other interaction that triggers a close of a subdock at the wrong time should also work).
- Open at least two instances of an app (so they are grouped into a subdock)
- Click the corresponding icon.
Expected behavior Scale is started showing only the views of the selected app. The corresponding subdock might appear as you move the mouse over the icon, and it might be closed once scale starts.
Screenshots or stacktrace This is from the syslog and unfortunately I don't have debug info enabled:
Aug 24 12:30:25 thinkpad-x230 wayfire[2351]: II 24-08-24 12:30:25.053 - [src/view/xdg-shell/xdg-toplevel-view.cpp:29] new xdg_shell_stable surface: app-id: sakura
Aug 24 12:30:25 thinkpad-x230 systemd[2109]: Started vte-spawn-6d04869b-f1bd-42b2-979d-8bd17ba5b16e.scope - VTE child process 3331 launched by sakura process 3323.
Aug 24 12:30:26 thinkpad-x230 wayfire[2351]: II 24-08-24 12:30:26.118 - [src/view/xdg-shell/xdg-toplevel-view.cpp:29] new xdg_shell_stable surface: app-id: sakura
Aug 24 12:30:26 thinkpad-x230 systemd[2109]: Started vte-spawn-98c67524-bf95-4b5e-b2d5-ea8498e0414e.scope - VTE child process 3353 launched by sakura process 3345.
Aug 24 12:30:26 thinkpad-x230 wayfire[2351]: II 24-08-24 12:30:26.141 - [src/view/xdg-shell.cpp:124] New xdg popup
Aug 24 12:30:26 thinkpad-x230 wayfire[2351]: EE 24-08-24 12:30:26.141 - [types/xdg_shell/wlr_xdg_surface.c:169] A configure is scheduled for an uninitialized xdg_surface 0x5a47d57ede60
Aug 24 12:30:30 thinkpad-x230 wayfire[2351]: II 24-08-24 12:30:30.016 - [src/view/xdg-shell.cpp:124] New xdg popup
Aug 24 12:30:30 thinkpad-x230 wayfire[2351]: EE 24-08-24 12:30:30.016 - [types/xdg_shell/wlr_xdg_surface.c:169] A configure is scheduled for an uninitialized xdg_surface 0x5a47d5827870
Aug 24 12:30:30 thinkpad-x230 wayfire[2351]: EE 24-08-24 12:30:30.030 - [src/main.cpp:141] Fatal error: Segmentation fault
Aug 24 12:30:30 thinkpad-x230 wayfire[2351]: EE 24-08-24 12:30:30.068 - #1 signal_handler(int) main.cpp:?
Aug 24 12:30:30 thinkpad-x230 wayfire[2351]: EE 24-08-24 12:30:30.652 - #2 __restore_rt libc_sigaction.c:?
Aug 24 12:30:30 thinkpad-x230 wayfire[2351]: EE 24-08-24 12:30:30.666 - #3 wayfire_foreign_toplevel::init_request_handlers()::{lambda(void*)#3}::operator()(void*) const ??:?
Aug 24 12:30:30 thinkpad-x230 wayfire[2351]: EE 24-08-24 12:30:30.684 - #4 wf::wl_listener_wrapper::emit(void*) ??:?
Aug 24 12:30:30 thinkpad-x230 wayfire[2351]: EE 24-08-24 12:30:30.700 - #5 wl_signal_emit_mutable ??:?
Aug 24 12:30:30 thinkpad-x230 wayfire[2351]: EE 24-08-24 12:30:30.717 - #6 foreign_toplevel_handle_set_rectangle wlr_foreign_toplevel_management_v1.c:?
Aug 24 12:30:30 thinkpad-x230 wayfire[2351]: EE 24-08-24 12:30:30.730 - #7 ffi_prep_go_closure ??:?
Aug 24 12:30:30 thinkpad-x230 wayfire[2351]: EE 24-08-24 12:30:30.745 - #8 ?? ??:0
Aug 24 12:30:30 thinkpad-x230 wayfire[2351]: EE 24-08-24 12:30:30.756 - #9 ffi_call ??:?
Aug 24 12:30:30 thinkpad-x230 wayfire[2351]: EE 24-08-24 12:30:30.770 - #10 ?? ??:0
Aug 24 12:30:30 thinkpad-x230 wayfire[2351]: EE 24-08-24 12:30:30.784 - #11 wl_array_add ??:?
Aug 24 12:30:30 thinkpad-x230 wayfire[2351]: EE 24-08-24 12:30:30.798 - #12 wl_event_loop_dispatch ??:?
Aug 24 12:30:30 thinkpad-x230 wayfire[2351]: EE 24-08-24 12:30:30.814 - #13 wl_display_run ??:?
Aug 24 12:30:30 thinkpad-x230 wayfire[2351]: EE 24-08-24 12:30:30.835 - #14 main ??:?
Aug 24 12:30:31 thinkpad-x230 wayfire[2351]: EE 24-08-24 12:30:31.020 - #15 __libc_start_call_main ../sysdeps/x86/libc-start.c:74
Aug 24 12:30:31 thinkpad-x230 wayfire[2351]: EE 24-08-24 12:30:31.207 - #16 call_init ../csu/libc-start.c:128
Aug 24 12:30:31 thinkpad-x230 wayfire[2351]: EE 24-08-24 12:30:31.229 - #17 _start ??:?
Wayfire version git (44e1fa9c62e1f8c9f35cedbf3e86c6a0247e0b79)
What I think happens
- With the mouse pointing to an icon that represents multiple instances (multiple open views) of an app, CD will open a subdock, which is an xdg-popup (BTW I'm aware this is not a very good design choice, but changing it will take some work; however, it should not trigger a crash in the compositor)
- CD will send zwlr_foreign_toplevel_handle_v1::set_rectangle to update the minimize position of each view to be within the newly opened subdock
- Clicking on the icon, CD will send an IPC request to start scale
- When scale starts, Wayfire will close the xdg-popup
- The request from step
#2is processed by Wayfire, triggering the crash seen in the stacktrace above as it now refers to an inert popup's surface - CD receives an xdg_popup::popup_done event, but it is too late (since it already sent a related request in step
#2)
Note: likely step #2 actually happens later, but still before step #6. Another possibility is that there is actually a bug in CD and it does send a set_rectangle request on an already-closed popup, and I'll look into this separately, but IMO this should lead to a protocol error and not a crash.
If your hunch is correct, maybe https://0x0.st/Xy3U.diff will help. But it is hard to say for sure without at least address sanitizer so that we get line numbers.
So I'm able to reproduce this reliably with this code: https://github.com/dkondor/xdg_popup_crasher
Full stacktrace:
EE 24-08-24 15:05:53.039 - #1 signal_handler(int) ../src/main.cpp:143
EE 24-08-24 15:05:53.621 - #2 __restore_rt libc_sigaction.c:?
EE 24-08-24 15:05:53.659 - #3 wayfire_foreign_toplevel::handle_minimize_hint(wf::toplevel_view_interface_t*, wf::view_interface_t*, wlr_box) ../plugins/protocols/foreign-toplevel.cpp:235
EE 24-08-24 15:05:53.699 - #4 wayfire_foreign_toplevel::init_request_handlers()::{lambda(void*)#3}::operator()(void*) const ../plugins/protocols/foreign-toplevel.cpp:221
EE 24-08-24 15:05:53.739 - #5 void std::__invoke_impl<void, wayfire_foreign_toplevel::init_request_handlers()::{lambda(void*)#3}&, void*>(std::__invoke_other, wayfire_foreign_toplevel::init_request_handlers()::{lambda(void*)#3}&, void*&&) /usr/include/c++/13/bits/invoke.h:61
EE 24-08-24 15:05:53.777 - #6 std::enable_if<is_invocable_r_v<void, wayfire_foreign_toplevel::init_request_handlers()::{lambda(void*)#3}&, void*>, void>::type std::__invoke_r<void, wayfire_foreign_toplevel::init_request_handlers()::{lambda(void*)#3}&, void*>(wayfire_foreign_toplevel::init_request_handlers()::{lambda(void*)#3}&, void*&&) /usr/include/c++/13/bits/invoke.h:117
EE 24-08-24 15:05:53.816 - #7 std::_Function_handler<void (void*), wayfire_foreign_toplevel::init_request_handlers()::{lambda(void*)#3}>::_M_invoke(std::_Any_data const&, void*&&) /usr/include/c++/13/bits/std_function.h:291
EE 24-08-24 15:05:53.907 - #8 std::function<void (void*)>::operator()(void*) const ../src/util.cpp:591
EE 24-08-24 15:05:54.000 - #9 wf::wl_listener_wrapper::emit(void*) ../src/wl-listener-wrapper.tpp:59
EE 24-08-24 15:05:54.089 - #10 wf::handle_wrapped_listener(wl_listener*, void*) ../src/wl-listener-wrapper.tpp:11
EE 24-08-24 15:05:54.099 - #11 wl_signal_emit_mutable ??:?
EE 24-08-24 15:05:54.118 - #12 foreign_toplevel_handle_set_rectangle wlr_foreign_toplevel_management_v1.c:?
EE 24-08-24 15:05:54.129 - #13 ffi_prep_go_closure ??:?
EE 24-08-24 15:05:54.141 - #14 ?? ??:0
EE 24-08-24 15:05:54.150 - #15 ffi_call ??:?
EE 24-08-24 15:05:54.162 - #16 ?? ??:0
EE 24-08-24 15:05:54.173 - #17 wl_array_add ??:?
EE 24-08-24 15:05:54.184 - #18 wl_event_loop_dispatch ??:?
EE 24-08-24 15:05:54.194 - #19 wl_display_run ??:?
EE 24-08-24 15:05:54.279 - #20 main ../src/main.cpp:449
EE 24-08-24 15:05:54.404 - #21 __libc_start_call_main ../sysdeps/x86/libc-start.c:74
EE 24-08-24 15:05:54.531 - #22 call_init ../csu/libc-start.c:128
EE 24-08-24 15:05:56.382 - #23 _start ??:?
Nice, thanks for the good reproducer app! Did you check whether my patch helps at all?
Actually not the previous patch, it had a small mistake, see https://0x0.st/XyY3.diff
With your patch I seem to get a crash whenever closing a popup the normal way:
II 24-08-24 15:23:10.565 - [src/view/xdg-shell.cpp:124] New xdg popup
EE 24-08-24 15:23:10.565 - [types/xdg_shell/wlr_xdg_surface.c:169] A configure is scheduled for an uninitialized xdg_surface 0x63ce97ed03b0
EE 24-08-24 15:23:11.696 - [src/main.cpp:141] Fatal error: Segmentation fault
EE 24-08-24 15:23:11.783 - #1 signal_handler(int) ../src/main.cpp:143
EE 24-08-24 15:23:12.372 - #2 __restore_rt libc_sigaction.c:?
EE 24-08-24 15:23:13.471 - #3 wayfire_xdg_popup::destroy() ../src/view/xdg-shell.cpp:304
EE 24-08-24 15:23:14.586 - #4 auto xdg_popup_controller_t::xdg_popup_controller_t(wlr_xdg_popup*)::{lambda(auto:1)#1}::operator()<void*>(void*) const ../src/view/xdg-shell.cpp:410
EE 24-08-24 15:23:15.695 - #5 void std::__invoke_impl<void, xdg_popup_controller_t::xdg_popup_controller_t(wlr_xdg_popup*)::{lambda(auto:1)#1}&, void*>(std::__invoke_other, xdg_popup_controller_t::xdg_popup_controller_t(wlr_xdg_popup*)::{lambda(auto:1)#1}&, void*&&) /usr/include/c++/13/bits/invoke.h:61
EE 24-08-24 15:23:16.795 - #6 std::enable_if<is_invocable_r_v<void, xdg_popup_controller_t::xdg_popup_controller_t(wlr_xdg_popup*)::{lambda(auto:1)#1}&, void*>, void>::type std::__invoke_r<void, xdg_popup_controller_t::xdg_popup_controller_t(wlr_xdg_popup*)::{lambda(auto:1)#1}&, void*>(xdg_popup_controller_t::xdg_popup_controller_t(wlr_xdg_popup*)::{lambda(auto:1)#1}&, void*&&) /usr/include/c++/13/bits/invoke.h:117
EE 24-08-24 15:23:17.915 - #7 std::_Function_handler<void (void*), xdg_popup_controller_t::xdg_popup_controller_t(wlr_xdg_popup*)::{lambda(auto:1)#1}>::_M_invoke(std::_Any_data const&, void*&&) /usr/include/c++/13/bits/std_function.h:291
EE 24-08-24 15:23:18.006 - #8 std::function<void (void*)>::operator()(void*) const ../src/util.cpp:591
EE 24-08-24 15:23:18.092 - #9 wf::wl_listener_wrapper::emit(void*) ../src/wl-listener-wrapper.tpp:59
EE 24-08-24 15:23:18.179 - #10 wf::handle_wrapped_listener(wl_listener*, void*) ../src/wl-listener-wrapper.tpp:11
EE 24-08-24 15:23:18.191 - #11 wl_signal_emit_mutable ??:?
EE 24-08-24 15:23:18.205 - #12 destroy_xdg_popup :?
EE 24-08-24 15:23:18.220 - #13 xdg_surface_handle_role_resource_destroy wlr_xdg_surface.c:?
EE 24-08-24 15:23:18.232 - #14 wl_resource_queue_event ??:?
EE 24-08-24 15:23:18.244 - #15 wl_resource_destroy ??:?
EE 24-08-24 15:23:18.256 - #16 ffi_prep_go_closure ??:?
EE 24-08-24 15:23:18.267 - #17 ?? ??:0
EE 24-08-24 15:23:18.278 - #18 ffi_call ??:?
EE 24-08-24 15:23:18.289 - #19 ?? ??:0
EE 24-08-24 15:23:18.301 - #20 wl_array_add ??:?
EE 24-08-24 15:23:18.312 - #21 wl_event_loop_dispatch ??:?
EE 24-08-24 15:23:18.323 - #22 wl_display_run ??:?
EE 24-08-24 15:23:18.399 - #23 main ../src/main.cpp:449
EE 24-08-24 15:23:18.524 - #24 __libc_start_call_main ../sysdeps/x86/libc-start.c:74
EE 24-08-24 15:23:18.647 - #25 call_init ../csu/libc-start.c:128
EE 24-08-24 15:23:20.477 - #26 _start ??:?
My bad again, the line I wnat to add should be put 1 line earlier .. https://0x0.st/XyYg.diff
Yes, the latest one works (it writes the error message about unknown surface which is expected in this case). Thanks!