orb icon indicating copy to clipboard operation
orb copied to clipboard

Deadlock during ORB shutdown

Open okummer opened this issue 4 years ago • 4 comments

I found the following two threads in a server with stuck JMX calls:

   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.glassfish.gmbal.impl.ManagedObjectManagerImpl.jmxRegistrationDebug(ManagedObjectManagerImpl.java:1225)
        - waiting to lock <0x00000000e3cf8858> (a org.glassfish.gmbal.impl.ManagedObjectManagerImpl)
        at org.glassfish.gmbal.impl.MBeanImpl.unregister(MBeanImpl.java:315)
        - locked <0x00000000e3df4ab8> (a org.glassfish.gmbal.impl.MBeanImpl)
        at org.glassfish.gmbal.impl.JMXRegistrationManager.unregister(JMXRegistrationManager.java:201)
        - locked <0x00000000e3f2e628> (a java.lang.Object)
        at org.glassfish.gmbal.impl.MBeanTree.unregister(MBeanTree.java:383)
        - locked <0x00000000e3cf88a0> (a org.glassfish.gmbal.impl.MBeanTree)
        at org.glassfish.gmbal.impl.MBeanTree.unregister(MBeanTree.java:378)
        - locked <0x00000000e3cf88a0> (a org.glassfish.gmbal.impl.MBeanTree)
        at org.glassfish.gmbal.impl.MBeanTree.clear(MBeanTree.java:419)
        - locked <0x00000000e3cf88a0> (a org.glassfish.gmbal.impl.MBeanTree)
        at org.glassfish.gmbal.impl.ManagedObjectManagerImpl.init(ManagedObjectManagerImpl.java:322)
        at org.glassfish.gmbal.impl.ManagedObjectManagerImpl.close(ManagedObjectManagerImpl.java:344)
        at com.sun.corba.ee.impl.orb.ORBImpl.destroy(ORBImpl.java:1516)
        at ...

   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.glassfish.gmbal.impl.MBeanTree.getMBeanImpl(MBeanTree.java:413)
        - waiting to lock <0x00000000e3cf88a0> (a org.glassfish.gmbal.impl.MBeanTree)
        at org.glassfish.gmbal.impl.ManagedObjectManagerImpl.getFacetAccessor(ManagedObjectManagerImpl.java:746)
        - locked <0x00000000e3cf8858> (a org.glassfish.gmbal.impl.ManagedObjectManagerImpl)
        at org.glassfish.gmbal.impl.TypeConverterImpl$3.toManagedEntity(TypeConverterImpl.java:435)
        at org.glassfish.gmbal.impl.TypeConverterImpl$TypeConverterListBase.toManagedEntity(TypeConverterImpl.java:900)
        at org.glassfish.gmbal.impl.AttributeDescriptor.get(AttributeDescriptor.java:110)
        at org.glassfish.gmbal.impl.TypeConverterImpl$3.toManagedEntity(TypeConverterImpl.java:436)
        at org.glassfish.gmbal.impl.AttributeDescriptor.get(AttributeDescriptor.java:110)
        at org.glassfish.gmbal.impl.MBeanSkeleton.getAttribute(MBeanSkeleton.java:526)
        at org.glassfish.gmbal.impl.MBeanSkeleton.getAttributes(MBeanSkeleton.java:572)
        at org.glassfish.gmbal.impl.MBeanImpl.getAttributes(MBeanImpl.java:362)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttributes([email protected]/Unknown Source)
        at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttributes([email protected]/Unknown Source)
        at javax.management.remote.rmi.RMIConnectionImpl.doOperation([email protected]/Unknown Source)
        at ...

The two threads are trying to obtains locks on MBeanTree and ManagedObjectManagerImpl in an inconsistent order, leading to a deadlock. This prevents the ORB (and hence the server) from shutting down.

okummer avatar Dec 02 '21 12:12 okummer

Thanks for the report! Which server did this concern, and what version of it?

arjantijms avatar Dec 02 '21 13:12 arjantijms

This report applies to Glassfish 4.2.2. The server is a custom Java application that uses CORBA for outgoing connections and that is monitored over JMX. The other end of the CORBA connection also runs on Glassfish 4.2.2.

okummer avatar Dec 02 '21 14:12 okummer

I guess it's not easy to try to reproduce this on the current version of GlassFish?

The code for GlassFish 4.x wasn't transferred to Eclipse, and GlassFish 4.x is essentially unsupported.

arjantijms avatar Dec 03 '21 13:12 arjantijms

This is probably a rare bug, which we observed once during thousands or tens of thousands of shutdowns. I have little hope that I can reproduce it under controlled conditions.

But as I looked into the code, I see that the affected classes actually stem from https://github.com/eclipse-ee4j/orb-gmbal and not from this exact repo. Should I recreate my issue there?

Over there, the code on the main branch and the line numbers have not changed since 4.0.0 (the release used by 4.2.2 of the ORB). There is still the pattern that a thread synchronized on ManagedObjectManagerImpl may want to synchronize on MBeanTree and that a thread synchronized on MBeanTree may want to synchronize on ManagedObjectManagerImpl.

In my specific case, the access on org.glassfish.gmbal.impl.ManagedObjectManagerImpl#jmxRegistrationDebugFlag in jmxRegistrationDebug() would not have to be synchronized. It would be sufficient to make the field jmxRegistrationDebugFlag volatile to enfore correct concurrency semantics. This would break the cycle.

There might be other cycles, but those that I could find immediately are harmless: org.glassfish.gmbal.impl.MBeanTree#setRoot calls org.glassfish.gmbal.impl.ManagedObjectManagerImpl#constructMBean, but only while it is already synchronized on ManagedObjectManagerImpl, so that's fine. MBeanImpl makes no other direct calls to ManagedObjectManagerImpl that I can find and while calls through the MBeanSkeleton might be problematic due to a reference back to the ManagedObjectManagerInternal, this reference is probably only used in the analyze phase and not when answering to the MBeanImpl.

Long story short: It might well be that removing the synchronization for jmxRegistrationDebug() actually breaks the loop.

okummer avatar Dec 03 '21 16:12 okummer