Tracing support deadlock
RE: the deadlock - I built the repro as a docker container and found that the deadlock happens when I run the container on ~~my Centos box~~ seemingly all of our Centos 7 machines, but not on another fedora machine so I'm not sure if it points to the specific kernel or perhaps some part of the network configuration. At any rate, I'm glad this isn't widespread and I'll follow up with our IT dept to try and figure this out.
Originally posted by @onelson in https://github.com/getsentry/sentry-rust/issues/180#issuecomment-880104828
Working with my IT dept, they were able to reproduce the issue on our Centos7 hosts having the following kernel:
-
3.10.0-1062.1.2.el7.x86_64
The issue was not present on:
-
5.4.142-1.el7.elrepo.x86_64which is the current latest LTS, I'm told.
It's not clear which hosts we'll be able to freely update, so I'm hoping there might be something to look at on your end to sidestep whatever the issue might be.
if this might be related to some kind of network config, what openssl version do these systems have? and can you reproduce the deadlock when compiling with another tls provider?
I built the repro program as a Docker container with openssl 1.1.0l so it would have been the same in each case.
We use native-tls for our reqwest work extensively (and sentry works fine as of v0.23.0) so long as we don't try to use tracing.
Still, I can cook up another docker image for them to test with, selecting a different tls impl.
I've added a Dockerfile to the repo (flattening our base layers and stripping out our internal stuff).
Switching to rustls continues to exhibit the deadlock: https://github.com/LaikaStudios/actix-tracing-sentry-repro/commit/826829e701b7341012871a3e64d7b8b505446520
Hi @onelson! I notice in your example repro case you have enabled the debug-images feature. We're also using centos 7 and experiencing deadlocks, since updating to the latest sentry-rust release, anything since #545 where this feature got enabled by default.
The hang is caused by sending any event to sentry, via tracing or otherwise.
In our environment, running sentry::integrations::debug_images::debug_images(); (before initializing sentry) panics here:
thread 'main' panicked at /net/homedirs/jrray/.cargo/registry/src/gitlab.spimageworks.com-9db14e3de8474184/findshlibs-0.10.2/src/lib.rs:261:14:
attempt to add with overflow
When the debug-images feature is turned on, the first time an event is generated, this lazy init static is attempted to be initialized:
https://github.com/getsentry/sentry-rust/blob/374c961da2cb9d90a8d0307a763d642650cab3d8/sentry-debug-images/src/integration.rs#L59-L62
Inside the initialization code, the aforementioned function is run and panics. Then, sentry_panic::panic_handler kicks in and wants to send an event to sentry about the panic. This causes a recursive attempt to lazy initialize DEBUG_META again, resulting in a deadlock.
In our environment at least, it is not safe to enable the debug-images feature. But since this is now a default feature, it is difficult to avoid it getting inadvertently enabled.
Someone has attempted to contribute a fix for this here though wrapping vs saturating is debatable. Unfortunately the PR has gone stale.