Threading (?) causing glibc/low level crashes
Platform
Knowing the platform greatly narrows down the potential causes of the problem.
- Platform
linux-arm32/64, Raspberry Pi 3/4, amd64 - OS version buster
arm32,bookworm` arm64=aarch64, Ubuntu 24.04 -
hid4javaversion0.8.0 - openjdk 11.0.23 (arm32, amd64) resp. 17.0.11 (aarch64) on those platforms
To Reproduce
Steps to reproduce the behavior:
Write a trivial program
HidServices hidServices =
HidManager.getHidServices(new HidServicesSpecification());
while (true) hidServices.getAttachedHidDevices();
let it run for a while on the specified platforms.
Expected behavior
Runs without issues forever.
Screenshots and logs
I observed three crash modes so far (note I have a littlescript running the app and logging some stuff, but the basic program is as above):
all of them often appear within a few minutes of running that loop, however, sometimes they don't appear for a long time or only after I plugged in some devices and read/wrote some data to them...
2024-07-18T10:53:34,225 INFO [org.example.Main.main()] org.example.Main - enumerate hid devices...
2024-07-18T10:53:34,226 INFO [org.example.Main.main()] org.example.Main - =======================
2024-07-18T10:53:34,227 INFO [org.example.Main.main()] org.example.Main - enumerate hid devices...
2024-07-18T10:53:34,228 INFO [org.example.Main.main()] org.example.Main - =======================
2024-07-18T10:53:34,229 INFO [org.example.Main.main()] org.example.Main - enumerate hid devices...
double free or corruption (!prev)
./run.sh: line 7: 1484 Aborted MAVEN_OPTS="-ea" mvn package exec:java "-Dexec.mainClass=org.example.Main"
FATAL ERROR EXIT CODE 134 AT ./run.sh:7
2024-07-18T10:53:44,120 INFO [org.example.Main.main()] org.example.Main - enumerate hid devices...
2024-07-18T10:53:44,120 INFO [org.example.Main.main()] org.example.Main - =======================
2024-07-18T10:53:44,120 INFO [org.example.Main.main()] org.example.Main - enumerate hid devices...
corrupted size vs. prev_size
./run.sh: line 7: 2845 Aborted MAVEN_OPTS="-ea" mvn package exec:java "-Dexec.mainClass=org.example.Main"
FATAL ERROR EXIT CODE 134 AT ./run.sh:7
2024-07-18T10:53:44,120 INFO [org.example.Main.main()] org.example.Main - enumerate hid devices...
2024-07-18T10:53:44,120 INFO [org.example.Main.main()] org.example.Main - =======================
2024-07-18T10:53:44,120 INFO [org.example.Main.main()] org.example.Main - enumerate hid devices...
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x0000007fa7bada9c, pid=2754, tid=2802
#
# JRE version: OpenJDK Runtime Environment (17.0.11+9) (build 17.0.11+9-Debian-1deb12u1)
# Java VM: OpenJDK 64-Bit Server VM (17.0.11+9-Debian-1deb12u1, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
# Problematic frame:
# C [libc.so.6+0x8da9c]
[timeout occurred during error reporting in step "printing problematic frame"] after 30 s.
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/pi/hid4java-apd-test/hs_err_pid2754.log
# [ timer expired, abort... ]
./run.sh: line 7: 2754 Aborted MAVEN_OPTS="-ea" mvn package exec:java "-Dexec.mainClass=org.example.Main"
FATAL ERROR EXIT CODE 134 AT ./run.sh:7
or, on Ubuntu 24.04
2024-07-18T11:07:20,940 INFO [org.example.Main.main()] org.example.Main - =======================
2024-07-18T11:07:20,940 INFO [org.example.Main.main()] org.example.Main - enumerate hid devices...
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x000074a8142ab7ec, pid=12909, tid=12965
#
# JRE version: OpenJDK Runtime Environment (11.0.23+9) (build 11.0.23+9-post-Ubuntu-1ubuntu1)
# Java VM: OpenJDK 64-Bit Server VM (11.0.23+9-post-Ubuntu-1ubuntu1, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C [libc.so.6+0xab7ec]
[timeout occurred during error reporting in step "printing problematic frame"] after 30 s.
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /home/ubuntu/hid4java-apd-test/core.12909)
#
# An error report file with more information is saved as:
# /home/ubuntu/hid4java-apd-test/hs_err_pid12909.log
Additional information I have not observed any of these failure modes on amd64 Windows 10, the loop seems to run forever there as it should.
However, on Linux it's definitely broken on every platform I tested.
It seems a lot of such issues can be caused by talking to native code from multiple java threads:
https://stackoverflow.com/questions/22491797/java-double-free-or-corruption https://stackoverflow.com/questions/49628615/understanding-corrupted-size-vs-prev-size-glibc-error
I don't quite understand why hid4java needs any threads in the first place
at least for my usecase, all I would need are synchronous enumeration, synchronous read & write (with timeout), all of which are synchronous calls in hidapi
fwiw I have attached the hs_err log files hs_err_pid2754.log hs_err_pid12909.log
for now, I have created a private fork of this repo and removed all Thread based functionality (scan thread, reader thread); now the same infinite loop never crashes the program
FWIW, with hid4java:0.8.0 I also did get a fatal error/crash on Windows 10 at least once now, hs err attached
#
# A fatal error has been detected by the Java Runtime Environment:
#
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffd41751b40, pid=22844, tid=27032
#
# JRE version: OpenJDK Runtime Environment Temurin-11.0.23+9 (11.0.23+9) (build 11.0.23+9)
# Java VM: OpenJDK 64-Bit Server VM Temurin-11.0.23+9 (11.0.23+9, mixed mode, tiered, compressed oops, g1 gc, windows-amd64)
# Problematic frame:
# C 0x00007ffd41751b40
#
# No core dump will be written. Minidumps are not enabled by default on client versions of Windows
#
# If you would like to submit a bug report, please visit:
# https://github.com/adoptium/adoptium-support/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
with the threadless version I cannot reproduce this anymore; there must be some thread safety concerns that are violated by the current implementation
I would be fairly cautious interacting with libhidapi.so and even with JNA in anything but a single-threaded or at least serialized fashion...
Here's a reference for hidapi not being thread-safe: https://github.com/libusb/hidapi/wiki
FAQ hidapi is not thread-safe in general. How to use hidapi in multithreaded application? https://github.com/libusb/hidapi/issues/45