sentry-rust icon indicating copy to clipboard operation
sentry-rust copied to clipboard

Tracing support deadlock

Open onelson opened this issue 4 years ago • 5 comments

RE: the deadlock - I built the repro as a docker container and found that the deadlock happens when I run the container on ~~my Centos box~~ seemingly all of our Centos 7 machines, but not on another fedora machine so I'm not sure if it points to the specific kernel or perhaps some part of the network configuration. At any rate, I'm glad this isn't widespread and I'll follow up with our IT dept to try and figure this out.

Originally posted by @onelson in https://github.com/getsentry/sentry-rust/issues/180#issuecomment-880104828


Working with my IT dept, they were able to reproduce the issue on our Centos7 hosts having the following kernel:

  • 3.10.0-1062.1.2.el7.x86_64

The issue was not present on:

  • 5.4.142-1.el7.elrepo.x86_64 which is the current latest LTS, I'm told.

It's not clear which hosts we'll be able to freely update, so I'm hoping there might be something to look at on your end to sidestep whatever the issue might be.

onelson avatar Aug 20 '21 01:08 onelson

if this might be related to some kind of network config, what openssl version do these systems have? and can you reproduce the deadlock when compiling with another tls provider?

Swatinem avatar Aug 20 '21 07:08 Swatinem

I built the repro program as a Docker container with openssl 1.1.0l so it would have been the same in each case.

We use native-tls for our reqwest work extensively (and sentry works fine as of v0.23.0) so long as we don't try to use tracing.

Still, I can cook up another docker image for them to test with, selecting a different tls impl.

onelson avatar Aug 20 '21 16:08 onelson

I've added a Dockerfile to the repo (flattening our base layers and stripping out our internal stuff).

Switching to rustls continues to exhibit the deadlock: https://github.com/LaikaStudios/actix-tracing-sentry-repro/commit/826829e701b7341012871a3e64d7b8b505446520

onelson avatar Aug 20 '21 18:08 onelson

Hi @onelson! I notice in your example repro case you have enabled the debug-images feature. We're also using centos 7 and experiencing deadlocks, since updating to the latest sentry-rust release, anything since #545 where this feature got enabled by default.

The hang is caused by sending any event to sentry, via tracing or otherwise.

In our environment, running sentry::integrations::debug_images::debug_images(); (before initializing sentry) panics here:

thread 'main' panicked at /net/homedirs/jrray/.cargo/registry/src/gitlab.spimageworks.com-9db14e3de8474184/findshlibs-0.10.2/src/lib.rs:261:14:
attempt to add with overflow

When the debug-images feature is turned on, the first time an event is generated, this lazy init static is attempted to be initialized:

https://github.com/getsentry/sentry-rust/blob/374c961da2cb9d90a8d0307a763d642650cab3d8/sentry-debug-images/src/integration.rs#L59-L62

Inside the initialization code, the aforementioned function is run and panics. Then, sentry_panic::panic_handler kicks in and wants to send an event to sentry about the panic. This causes a recursive attempt to lazy initialize DEBUG_META again, resulting in a deadlock.

In our environment at least, it is not safe to enable the debug-images feature. But since this is now a default feature, it is difficult to avoid it getting inadvertently enabled.

jrray avatar Mar 01 '24 23:03 jrray

Someone has attempted to contribute a fix for this here though wrapping vs saturating is debatable. Unfortunately the PR has gone stale.

jrray avatar Mar 01 '24 23:03 jrray