DeadLock on simultanously accessing uninitialized RefreshScope-Bean
Hi!
I'm using version 2.0.0.M9. Sometimes my application runs into a deadlock. This happens when two threads are simultanously accessing an uninitialized RefreshScope Bean.
My application uses the annotations NewSpan and Async . Both annotations cause access to a ProbabilityBasedSampler, that's provided as RefreshScope-Bean by ZipkinAutoConfiguration.RefreshScopedProbabilityBasedSamplerConfiguration.defaultTraceSampler(SamplerProperties).
Thread 1 creates a Singleton-Bean, with a @PostConstruct calling a sampled method (@Async-annotation). Thread 2 calls a sampled method (@NewSpan-annotation). ProbabilityBasedSampler and SamplerProperties are not yet initialized.
This is what happens:
-
Thread 1: tries to create the Singleton, accesses
DefaultSingletonBeanRegistry.getSingleton(String, ObjectFactory<?>)and locksDefaultSingletonBeanRegistry.singletonObjects -
Thread 2: calls a
@NewSpan-annotated method. This requires theProbabilityBasedSamplerand triggersGenericScope.BeanLifecycleWrapper.getBean()and locksGenericScope.BeanLifecycleWrapper.name -
Thread 2: to create
ProbabilityBasedSamplertheSamplerPropertiesare required. The bean factory tries to get theSamplerProperties-Singleton, but has to wait for the lock onDefaultSingletonBeanRegistry.singletonObjectsthat is held by Thread 1 -
Thread 1: the
@PostConstructcalls a@Async-annotated method, that requiresProbabilityBasedSampler. that triggersGenericScope.BeanLifecycleWrapper.getBean(), but has to wait for the lockGenericScope.BeanLifecycleWrapper.nameheld by Thread 2
I attach a class that simulates the behaviour of my application setup and the stack trace when this class runs into the deadlock: RefreshScopeDeadLock.zip
Please try with the latest, RC2.
Same result (the affected code didn't change as far as I can see). I attached the updated stack trace (the line numbers of GenericScope.BeanLifecycleWrapper were wrong in the old version): RefreshScopeDeadLock-2.0.0.RC2.zip
By the way: It doesn't have an effect on my issue, but as far as I can see in GenericScope.BeanLifecycleWrapper.getBean() double-check-locking is used without using an volatile synchronization object. As far as I known this is not thread safe.
I'm unable to see the BLOCKED state. What do I need to do to see it?
I simplified the Class, added some dokumentation and output. Hope now it's clear what happens RefreshScopeDeadLock-2.0.0.RC2-v2.zip
I think we can safely call this a bug. But I'm not sure where to start fixing it. Arguably, there is no point having Sleuth applied to an @Async method called from @PostConstruct of a singleton, and if it wasn't the deadlock would never happen in a vanilla Spring Boot app, as I understand it. On the other hand, the sample app doesn't use Sleuth, and it shows that users can create the conditions that trigger the deadlock relatively easily. The fact that one lock belongs to Spring Framework and the other to Spring Cloud makes it hard to come up with a compromise.
I put that change on a branch because I'm not sure it's finished (there's no test for instance), but I think it fixes this issue.
@dsyer - we are running in to a similar issue. Can you please provide me details of the branch - so I can give it a shot ?.
The change is linked to above my last comment.
is this still a issue or fixed in latest version.
The problem is actual for:
- spring-boot: 2.1.6.RELEASE
- spring-core: 5.1.15.RELEASE
- spring-cloud-commons: 2.1.0.RELEASE
Is a fix for this issue expected in the near future?