runtime icon indicating copy to clipboard operation
runtime copied to clipboard

Test failing: ServiceProviderILEmitContainerTests.GetAsyncService_DisposeAsyncOnSameThread_ThrowsAndDoesNotHangAndDisposeAsyncGetsCalled

Open steveharter opened this issue 1 year ago • 7 comments

Build Information

Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=719926 Build error leg or test failing: Microsoft.Extensions.DependencyInjection.Tests.ServiceProviderILEmitContainerTests.GetAsyncService_DisposeAsyncOnSameThread_ThrowsAndDoesNotHangAndDisposeAsyncGetsCalled Pull request: https://github.com/dotnet/runtime/pull/103968

  Starting:    Microsoft.Extensions.DependencyInjection.Tests (parallel test collections = on [2 threads], stop on fail = off)
    Microsoft.Extensions.DependencyInjection.Tests.ServiceProviderILEmitContainerTests.GetAsyncService_DisposeAsyncOnSameThread_ThrowsAndDoesNotHangAndDisposeAsyncGetsCalled [FAIL]
      Assert.True() Failure
      Expected: True
      Actual:   False
      Stack Trace:
        /_/src/libraries/Microsoft.Extensions.DependencyInjection/tests/DI.Tests/ServiceProviderContainerTests.cs(398,0): at Microsoft.Extensions.DependencyInjection.Tests.ServiceProviderContainerTests.GetAsyncService_DisposeAsyncOnSameThread_ThrowsAndDoesNotHangAndDisposeAsyncGetsCalled()

Error Message

{
  "ErrorMessage": "Microsoft.Extensions.DependencyInjection.Tests.ServiceProviderILEmitContainerTests.GetAsyncService_DisposeAsyncOnSameThread_ThrowsAndDoesNotHangAndDisposeAsyncGetsCalled [FAIL]",
  "ErrorPattern": "",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Known issue validation

Build: :mag_right: https://dev.azure.com/dnceng-public/public/_build/results?buildId=719926 Error message validated: [Microsoft.Extensions.DependencyInjection.Tests.ServiceProviderILEmitContainerTests.GetAsyncService_DisposeAsyncOnSameThread_ThrowsAndDoesNotHangAndDisposeAsyncGetsCalled [FAIL]] Result validation: :white_check_mark: Known issue matched with the provided build. Validation performed at: 6/26/2024 2:56:25 PM UTC

Report

Build Definition Test Pull Request
772017 dotnet/runtime Microsoft.Extensions.DependencyInjection.Tests.ServiceProviderILEmitContainerTests.GetAsyncService_DisposeAsyncOnSameThread_ThrowsAndDoesNotHangAndDisposeAsyncGetsCalled dotnet/runtime#105403

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 1

steveharter avatar Jun 26 '24 14:06 steveharter

Tagging subscribers to this area: @dotnet/area-extensions-dependencyinjection See info in area-owners.md if you want to be subscribed.

Perhaps this is a case where the IL Emit version of the implementation is created after a certain number of calls (e.g. 3), and there is a concurrency issue with the switch to IL?

steveharter avatar Jul 30 '24 18:07 steveharter

Failed in: runtime-coreclr libraries-pgo 20240804.1

Failed tests:

net9.0-windows-Release-x86-jitosr_stress-Windows.10.Amd64.Open
    - Microsoft.Extensions.DependencyInjection.Tests.ServiceProviderILEmitContainerTests.GetAsyncService_DisposeAsyncOnSameThread_ThrowsAndDoesNotHangAndDisposeAsyncGetsCalled

Error message:

 Assert.True() Failure
Expected: True
Actual:   False

Stack trace:

   at Microsoft.Extensions.DependencyInjection.Tests.ServiceProviderContainerTests.GetAsyncService_DisposeAsyncOnSameThread_ThrowsAndDoesNotHangAndDisposeAsyncGetsCalled() in /_/src/libraries/Microsoft.Extensions.DependencyInjection/tests/DI.Tests/ServiceProviderContainerTests.cs:line 384
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodBaseInvoker.InterpretedInvoke_Method(Object obj, IntPtr* args) in /_/src/coreclr/System.Private.CoreLib/src/System/Reflection/MethodBaseInvoker.CoreCLR.cs:line 36
   at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr) in /_/src/libraries/System.Private.CoreLib/src/System/Reflection/MethodBaseInvoker.cs:line 57

v-wenyuxu avatar Aug 05 '24 01:08 v-wenyuxu

@steveharter does your theory imply a regression or an existing issue in either DI or reflection?

ericstj avatar Aug 06 '24 15:08 ericstj

@steveharter does your theory imply a regression or an existing issue in either DI or reflection?

I don't think that theory is valid after looking at this more.

The PR that added the test and it uses the problematic sync-over-async approach to avoid deadlocks.

Also, there is a timeout of 10 seconds, but that seems plenty - my machine runs that in a millisecond or so I don't think it's a timeout issue. It's more likely related to sync-over-async.

I ran the tests locally 2,000 times without issues.

Still investigating; not sure if this occurred in the more distant past but no issue created for it.

steveharter avatar Aug 10 '24 19:08 steveharter

The most recent change that directly modified this code is https://github.com/dotnet/runtime/pull/53325/files which was in v6.0.

steveharter avatar Aug 12 '24 15:08 steveharter

Moving to v10; this may be a test-only issue. If not, the workaround is to add a synchronous Dispose().

steveharter avatar Aug 13 '24 14:08 steveharter

@steveharter I did some digging. The main take away is that the tests in this assembly are susceptible to thread pool starvation. The reason being the many "sync over async" uses in the tests AND the implementation.

Reasoning

  • The agent running the tests has only 2 cores as seen from the log: Starting: Microsoft.Extensions.DependencyInjection.Tests (parallel test collections = on [2 threads], stop on fail = off)
  • I was able to reproduce the test failure by artificially constraining the thread pool.
  • This test is particularly susceptible as it consumes three thread pool threads by itself.
    • A blocking Wait inside of the test method (source)
    • SingleThreadedSynchronizationContext.Run inside Task.Run (source)
    • DisposeAsync inside Task.Run blocking SingleThreadedSynchronizationContext.Run (source)
  • This test has a timeout. This means that if it takes too long for the thread pool to spool up threads to replace the blocked threads it fails. Other tests with similar issues have no timeouts meaning they just run much slower than necessary but do not fail.

Options:

  1. Remove the timeout. It will complete ... eventually.
  2. Manually resize the thread pool to ensure enough threads are readily available even on resource constrained systems. This might degrade performance due to unnecessary context switches.
  3. Change the test by providing a Dispose method. This degrades test coverage.
  4. Reduce the "sync over async" uses in the tests. This is quite a bit of effort.

koenigst avatar Sep 06 '24 09:09 koenigst