client-toolkit icon indicating copy to clipboard operation
client-toolkit copied to clipboard

Segmentation fault in safe code when writing to wayland window managed by winit

Open john01dav opened this issue 3 years ago • 3 comments

I am currently working on rewriting the Wayland backend in my crate Softbuffer. I am using this client toolkit to do so since its automatic buffer management from the pool seems useful. I'm writing my code based on the image_viewer example in commit f03c4b7dea89648dbcba43de80eb14687e0e9da8. My current code is here, which is a pretty good almost minimum example.

The issue is that when I run this (on Gnome 42.4 on Fedora), I get segmentation faults shortly after running. According to GCC these faults take place in wayland-client at src/native_lib/event_queue.rs. It seems quite likely to me that I am doing something wrong, but this also looks like a safety bug in either the client toolkit or wayland-client, as my only unsafe code for this backend in Softbuffer is to cast the handles from raw window handle, and the client toolkit advertises safety in its readme on crates.io.

To reproduce this bug, simply run an example. I am testing with the animation example primarily, however the others seem to have the problem too.

cargo run --example animation

john01dav avatar Aug 28 '22 06:08 john01dav

It should work if you store the event queue and dispatch it during set_buffer_safe.

diff --git a/Cargo.toml b/Cargo.toml
index 93f499c..d506f7c 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -27,9 +27,6 @@ wayland-client = { version = "0.29", features = ["use_system_lib"], default_feat
 smithay-client-toolkit = { version = "0.16.0", optional = true}
 x11-dl = "2.19.1"
 
-[patch.crates-io]
-wayland-client = { path = "../wayland-client-0.29.4" }
-
 [target.'cfg(target_os = "windows")'.dependencies.winapi]
 version = "0.3.9"
 features = ["windef", "wingdi", "winuser"]
diff --git a/src/wayland.rs b/src/wayland.rs
index b154d19..272b7f6 100644
--- a/src/wayland.rs
+++ b/src/wayland.rs
@@ -1,21 +1,16 @@
 use crate::{error::unwrap, GraphicsContextImpl, SoftBufferError};
+use bytemuck::cast_slice;
 use raw_window_handle::{HasRawWindowHandle, WaylandDisplayHandle, WaylandWindowHandle};
 use smithay_client_toolkit::shm::AutoMemPool;
-use std::ops::Deref;
-use std::{
-    fs::File,
-    io::Write,
-    os::unix::prelude::{AsRawFd, FileExt},
-};
-use bytemuck::cast_slice;
+
 use wayland_client::{
-    protocol::{wl_buffer::WlBuffer, wl_shm::WlShm, wl_surface::WlSurface},
+    protocol::{wl_shm::WlShm, wl_surface::WlSurface},
     sys::client::wl_display,
     Attached, Display, EventQueue, GlobalManager, Main, Proxy,
 };
 
 pub struct WaylandImpl {
-    //event_queue: EventQueue,
+    event_queue: EventQueue,
     pool: AutoMemPool,
     surface: WlSurface,
 }
@@ -30,7 +25,10 @@ impl WaylandImpl {
         Self::new_safe(display, surface)
     }
 
-    fn new_safe<W: HasRawWindowHandle>(display: Display, surface: WlSurface) -> Result<Self, SoftBufferError<W>>{
+    fn new_safe<W: HasRawWindowHandle>(
+        display: Display,
+        surface: WlSurface,
+    ) -> Result<Self, SoftBufferError<W>> {
         let mut event_queue = display.create_event_queue();
         let attached_display = display.attach(event_queue.token());
         let globals = GlobalManager::new(&attached_display);
@@ -48,10 +46,35 @@ impl WaylandImpl {
             "Failed to create AutoMemPool from smithay client toolkit",
         )?;
 
-        Ok(Self { pool, surface })
+        Ok(Self {
+            pool,
+            surface,
+            event_queue,
+        })
     }
 
-    fn set_buffer_safe(&mut self, buffer: &[u32], width: u16, height: u16){
+    fn set_buffer_safe(&mut self, buffer: &[u32], width: u16, height: u16) {
+        // The second method will try to read events from the socket. It is done in two
+        // steps, first the read is prepared, and then it is actually executed. This allows
+        // lower contention when different threads are trying to trigger a read of events
+        // concurently
+        if let Some(guard) = self.event_queue.prepare_read() {
+            // prepare_read() returns None if there are already events pending in this
+            // event queue, in which case there is no need to try to read from the socket
+            if let Err(e) = guard.read_events() {
+                if e.kind() != ::std::io::ErrorKind::WouldBlock {
+                    // if read_events() returns Err(WouldBlock), this just means that no new
+                    // messages are available to be read
+                    eprintln!(
+                        "Error while trying to read from the wayland socket: {:?}",
+                        e
+                    );
+                }
+            }
+        }
+
+        self.event_queue.dispatch(&mut (), |_, _, _| {}).unwrap();
+
         let (canvas, new_buffer) = self
             .pool
             .buffer(
@@ -62,7 +85,7 @@ impl WaylandImpl {
             )
             .unwrap();
 
-        assert_eq!(canvas.len(), buffer.len()*4);
+        assert_eq!(canvas.len(), buffer.len() * 4);
         canvas.copy_from_slice(cast_slice(buffer));
 
         self.surface.attach(Some(&new_buffer), 0, 0);
@@ -81,15 +104,11 @@ impl WaylandImpl {
         }
 
         self.surface.commit();
-        //self.event_queue.dispatch(&mut next_action, |_, _, _| {}).unwrap();
     }
-
 }
 
 impl GraphicsContextImpl for WaylandImpl {
-
     unsafe fn set_buffer(&mut self, buffer: &[u32], width: u16, height: u16) {
         self.set_buffer_safe(buffer, width, height);
     }
-
 }

cmeissl avatar Aug 28 '22 12:08 cmeissl

@cmeissl

I considered trying something like that myself, but when I have the event queue not dropped by the time that set_buffer is called, the entire desktop (mouse movement and everything) freezes, and then my program terminates with the following error message:

wl_shm_pool@42: error 2: shrinking pool invalid

This happens even if I process the event queue's events exactly as you do in your example code.

john01dav avatar Aug 28 '22 20:08 john01dav

@cmeissl

I considered trying something like that myself, but when I have the event queue not dropped by the time that set_buffer is called, the entire desktop (mouse movement and everything) freezes, and then my program terminates with the following error message:

wl_shm_pool@42: error 2: shrinking pool invalid

This happens even if I process the event queue's events exactly as you do in your example code.

Dropping the event queue is probably the cause of the segfault. Dispatching the queue before everything else should fix the pool error (it does for me). I think the yanky mouse could be the result of submitting a lot of buffers in a short period. Had not much time to check but I think the example renders without a delay, so you fill up the server with buffers.

cmeissl avatar Aug 28 '22 20:08 cmeissl