spring-cloud-commons icon indicating copy to clipboard operation
spring-cloud-commons copied to clipboard

DeadLock on simultanously accessing uninitialized RefreshScope-Bean

Open schabe77 opened this issue 7 years ago • 11 comments

Hi!

I'm using version 2.0.0.M9. Sometimes my application runs into a deadlock. This happens when two threads are simultanously accessing an uninitialized RefreshScope Bean.

My application uses the annotations NewSpan and Async . Both annotations cause access to a ProbabilityBasedSampler, that's provided as RefreshScope-Bean by ZipkinAutoConfiguration.RefreshScopedProbabilityBasedSamplerConfiguration.defaultTraceSampler(SamplerProperties).

Thread 1 creates a Singleton-Bean, with a @PostConstruct calling a sampled method (@Async-annotation). Thread 2 calls a sampled method (@NewSpan-annotation). ProbabilityBasedSampler and SamplerProperties are not yet initialized.

This is what happens:

  1. Thread 1: tries to create the Singleton, accesses DefaultSingletonBeanRegistry.getSingleton(String, ObjectFactory<?>) and locks DefaultSingletonBeanRegistry.singletonObjects

  2. Thread 2: calls a @NewSpan-annotated method. This requires the ProbabilityBasedSampler and triggers GenericScope.BeanLifecycleWrapper.getBean() and locks GenericScope.BeanLifecycleWrapper.name

  3. Thread 2: to create ProbabilityBasedSampler the SamplerProperties are required. The bean factory tries to get the SamplerProperties-Singleton, but has to wait for the lock on DefaultSingletonBeanRegistry.singletonObjects that is held by Thread 1

  4. Thread 1: the @PostConstruct calls a @Async-annotated method, that requires ProbabilityBasedSampler. that triggers GenericScope.BeanLifecycleWrapper.getBean(), but has to wait for the lock GenericScope.BeanLifecycleWrapper.name held by Thread 2

I attach a class that simulates the behaviour of my application setup and the stack trace when this class runs into the deadlock: RefreshScopeDeadLock.zip

schabe77 avatar Jun 13 '18 11:06 schabe77

Please try with the latest, RC2.

spencergibb avatar Jun 13 '18 11:06 spencergibb

Same result (the affected code didn't change as far as I can see). I attached the updated stack trace (the line numbers of GenericScope.BeanLifecycleWrapper were wrong in the old version): RefreshScopeDeadLock-2.0.0.RC2.zip

By the way: It doesn't have an effect on my issue, but as far as I can see in GenericScope.BeanLifecycleWrapper.getBean() double-check-locking is used without using an volatile synchronization object. As far as I known this is not thread safe.

schabe77 avatar Jun 13 '18 12:06 schabe77

I'm unable to see the BLOCKED state. What do I need to do to see it?

spencergibb avatar Jun 13 '18 20:06 spencergibb

I simplified the Class, added some dokumentation and output. Hope now it's clear what happens RefreshScopeDeadLock-2.0.0.RC2-v2.zip

schabe77 avatar Jun 14 '18 07:06 schabe77

I think we can safely call this a bug. But I'm not sure where to start fixing it. Arguably, there is no point having Sleuth applied to an @Async method called from @PostConstruct of a singleton, and if it wasn't the deadlock would never happen in a vanilla Spring Boot app, as I understand it. On the other hand, the sample app doesn't use Sleuth, and it shows that users can create the conditions that trigger the deadlock relatively easily. The fact that one lock belongs to Spring Framework and the other to Spring Cloud makes it hard to come up with a compromise.

dsyer avatar Jun 19 '18 10:06 dsyer

I put that change on a branch because I'm not sure it's finished (there's no test for instance), but I think it fixes this issue.

dsyer avatar Jul 09 '18 12:07 dsyer

@dsyer - we are running in to a similar issue. Can you please provide me details of the branch - so I can give it a shot ?.

durgadeep avatar Jul 18 '18 17:07 durgadeep

The change is linked to above my last comment.

dsyer avatar Jul 18 '18 17:07 dsyer

is this still a issue or fixed in latest version.

jrramp avatar Sep 01 '20 16:09 jrramp

The problem is actual for:

  • spring-boot: 2.1.6.RELEASE
  • spring-core: 5.1.15.RELEASE
  • spring-cloud-commons: 2.1.0.RELEASE

Is a fix for this issue expected in the near future?

rumter avatar Dec 23 '21 13:12 rumter