vstest icon indicating copy to clipboard operation
vstest copied to clipboard

Random crashes of the test host when using `System.Drawing`

Open Erikvl87 opened this issue 4 years ago • 7 comments

Description

We are observing random crashes of the test host process when using System.Drawing. This only seems to happen when having two different tests, each creating and using their own Graphics object.

Steps to reproduce

This only seems to be reproducible in a Linux environment. Make two test classes containing the following code:

private const string FontFamilyName = "Calibri";
public static int FontSize = 11;

[Fact]
public void TestMethod1()
{
	var graphics = Graphics.FromImage(new Bitmap(1, 1));
	var font = new Font(FontFamilyName, FontSize);
	var stringSize = graphics.MeasureString("measure me", font);
	graphics.Dispose();
}

Execute the tests and repeat that 10 to 15 times. Observe that this will fail randomly.

I've created a small repo to clone to reproduce the issue: https://github.com/Erikvl87/vstest-issue-3085

Expected behavior

Expected all tests to consistently succeed.

Actual behavior

Tests succeed or fail at random: Output from the dotnet test command:

Test run for /app/MyTests/bin/Release/netcoreapp3.1/MyTests.dll(.NETCoreApp,Version=v3.1)
Microsoft (R) Test Execution Command Line Tool Version 16.7.1-release-20201002-01
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
Logging Vstest Diagnostics in file: /app/diag.log

A total of 1 test files matched the specified pattern.
/app/MyTests/bin/Release/netcoreapp3.1/MyTests.dll
Data collection : Logging DataCollector Diagnostics in file: /app/diag.datacollector.21-10-01_13-47-00_12766_4.log
Logging TestHost Diagnostics in file: /app/diag.host.21-10-01_13-47-00_45431_4.log
[xUnit.net 00:00:00.00] xUnit.net VSTest Adapter v2.4.3+1b45f5407b (64-bit .NET Core 3.1.19)
[xUnit.net 00:00:00.43]   Discovering: MyTests
[xUnit.net 00:00:00.47]   Discovered:  MyTests
[xUnit.net 00:00:00.48]   Starting:    MyTests
The active test run was aborted. Reason: Test host process crashed

Attachments:
  /app/MyTests/TestResults/0ab27ee2-162f-4b5c-81c8-8806bdbd43fd/Sequence_24439200ad1b4316b57a80f5ab10cbb7.xml
Test Run Aborted.
WARNING: Overwriting results file: /app/MyTests/TestResults/results.trx
Results File: /app/MyTests/TestResults/results.trx

The active Test Run was aborted because the host process exited unexpectedly while executing following test(s):
MyTests.Tests1.TestMethod1

Diagnostic logs

Errors extracted from the datacollector logs:

TpTrace Error: 0 : 467, 1, 2021/09/30, 13:59:55.333, 18341584833600, datacollector.dll, The type "Microsoft.TestPlatform.Extensions.EventLogCollector.EventLogDataCollector" defined in "Microsoft.TestPlatform.Extensions.EventLogCollector.dll" does not have ExtensionUri attribute.
TpTrace Error: 0 : 467, 1, 2021/09/30, 13:59:55.335, 18341586607500, datacollector.dll, The type "Microsoft.TestPlatform.Extensions.BlameDataCollector.BlameCollector" defined in "Microsoft.TestPlatform.Extensions.BlameDataCollector.dll" does not have ExtensionUri attribute.
TpTrace Error: 0 : 467, 1, 2021/09/30, 13:59:55.337, 18341588536800, datacollector.dll, The type "Microsoft.TestPlatform.Extensions.BlameDataCollector.BlameCollector" defined in "Microsoft.TestPlatform.Extensions.BlameDataCollector.dll" does not have ExtensionUri attribute.
TpTrace Error: 0 : 467, 8, 2021/09/30, 13:59:57.813, 18344064970900, datacollector.dll, DataCollectionAttachmentManager.GetAttachments: Fail to get attachments: System.Collections.Generic.KeyNotFoundException: The given key 'Microsoft.VisualStudio.TestPlatform.ObjectModel.DataCollection.DataCollectionContext' was not present in the dictionary.
   at System.Collections.Generic.Dictionary`2.get_Item(TKey key)
   at Microsoft.VisualStudio.TestPlatform.Common.DataCollector.DataCollectionAttachmentManager.GetAttachments(DataCollectionContext dataCollectionContext) 
TpTrace Information: 0 : 467, 8, 2021/09/30, 13:59:57.816, 18344067474600, datacollector.dll, DataCollectionTestCaseEventHandler: Test case 'FontoXml.F4B.Backend.Conversion.Tests.SpecialCharacters.PwithHyphen - 5fc7633c-161f-3e6e-fdd2-8f05b8827dfb' completed
TpTrace Error: 0 : 467, 8, 2021/09/30, 13:59:57.816, 18344067938100, datacollector.dll, DataCollectionRequestHandler.HandleBeforeTestRunStart : Error occurred during initialization of TestHost : System.IO.EndOfStreamException: Unable to read beyond the end of the stream.
   at System.IO.BinaryReader.ReadByte()
   at System.IO.BinaryReader.Read7BitEncodedInt()
   at System.IO.BinaryReader.ReadString()
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.SocketCommunicationManager.ReceiveRawMessage()
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.SocketCommunicationManager.ReceiveMessage()
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.DataCollection.DataCollectionTestCaseEventHandler.ProcessRequests()
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.DataCollection.DataCollectionRequestHandler.<HandleBeforeTestRunStart>b__23_0()

Errors extracted from the dotnet test --diag file:

TpTrace Error: 0 : 694, 13, 2021/09/30, 14:23:54.321, 19780573026300, vstest.console.dll, Socket: Message loop: failed to receive message due to socket error System.IO.EndOfStreamException: Unable to read beyond the end of the stream.
   at System.IO.BinaryReader.ReadByte()
   at System.IO.BinaryReader.Read7BitEncodedInt()
   at System.IO.BinaryReader.ReadString()
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.LengthPrefixCommunicationChannel.NotifyDataAvailable()
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TcpClientExtensions.MessageLoopAsync(TcpClient client, ICommunicationChannel channel, Action`1 errorHandler, CancellationToken cancellationToken), remoteEndPoint: 127.0.0.1:57010 localEndPoint: 127.0.0.1:45677
TpTrace Error: 0 : 694, 4, 2021/09/30, 14:23:54.322, 19780573637100, vstest.console.dll, TestHostManagerCallbacks.ExitCallBack: Testhost processId: 723 exited with exitcode: 139 error: ''
TpTrace Error: 0 : 694, 13, 2021/09/30, 14:23:54.330, 19780581485900, vstest.console.dll, TestRequestSender: Aborting test run because Test host process crashed

Environment

Running on .NET Core 3.1.19, in the Docker base image mcr.microsoft.com/dotnet/sdk:3.1.413-alpine3.14. See https://github.com/Erikvl87/vstest-issue-3085 for a reproducible case in a Docker container.

Erikvl87 avatar Oct 01 '21 14:10 Erikvl87

System.Drawing on non Windows depends on libgdiplus, which is not actively maintained any more and is not reliable. For this reason we (owners of System.Drawing) are deprecating it on non Windows. https://docs.microsoft.com/en-us/dotnet/core/compatibility/core-libraries/6.0/system-drawing-common-windows-only

My guess is that you are hitting some of that flakiness, and it is randomly crashing your test execution process. the errors you pasted above are just what the runner does when its child terminates suddenly.

danmoseley avatar Oct 28 '21 02:10 danmoseley

Thank you for clarifing, @danmoseley. We've migrated most of our code to ImageSharp and that at least fixes our issues with randomly failing tests.

Apart from the cause of the crash, which now seems obviously something in System.Drawing, how would I've been able to pinpoint System.Drawing as the cause of the crashing test execution process given the produced logs? I've been removing code over and over until I finally found out this had something to do with System.Drawing. Is there any room to improve the logs that have been produced to at least point to libgdiplus as the culprit?

Erikvl87 avatar Nov 03 '21 14:11 Erikvl87

@Erikvl87 in the case of a segfault, the runtime likely simply terminates and the only possibility is to enable crash dumps on the OS and run again - then look at the dump.

@nohwnd any thoughts? Does the runner already pipe through everything it can from the child process?

Also, presumably the runner can say which test was being executed at the time of the crash -- that isn't lost?

danmoseley avatar Nov 03 '21 14:11 danmoseley

Also, presumably the runner can say which test was being executed at the time of the crash -- that isn't lost?

@danmoseley, the runner did tell us which test crashed, that wasn't lost. But we've had multiple (integration) tests that were using System.Drawing somewhere. Since they only failed randomly and occasionally, and it were integration tests, we've had a hard time finding the culprit.

Erikvl87 avatar Nov 03 '21 14:11 Erikvl87

@Erikvl87 Here is a PR https://github.com/Erikvl87/vstest-issue-3085/pull/1 that changes your repo to use .NET5 that has dumping capabilities built in.( I think you could also do directly on netcoreapp3.1 using env variables if you really need to avoid upgrading.)

Then the dump from attachments can be analyzed using dotnet-dump tool I added to your image:

./start.sh
<wait for crash, it won't close the container anymore>
dotnet-dump analyze <path  to dmp from attachments>

Now you are in the dotnet-dump command line, you can print parallel stacks to see what is happening and should see that xUnit is running one of your tests:

parallelstacks

<find   ~~~~ <thread id, for me 36b>
                             1 (dynamicClass).IL_STUB_PInvoke(HandleRef, String, Int32, HandleRef, RectangleF ByRef, HandleRef, RectangleF ByRef, Int32 ByRef, Int32 ByRef)
                             1 System.Drawing.Graphics.MeasureString(String, Font, SizeF, StringFormat)
                             1 System.Drawing.Graphics.MeasureString(String, Font, SizeF)
                             1 System.Drawing.Graphics.MeasureString(String, Font)
                             1 MyTests.Tests2.TestMethod2()   <<<<<<<<<<<<<<-------------------------------------------- your method
                             1 System.Reflection.RuntimeMethodInfo.Invoke(Object, BindingFlags, Binder, Object[], CultureInfo)
>

If you need more info about that stack, you can switch threads based on the id above the stack, it's hex, prefix it with 0x otherwise it will tell you that the thread id is invalid.

setthread --tid 0x36b # thread id from above

And dump the whole stack including the native calls:

clrstack -i -f


Dumping managed stack and managed variables using ICorDebug.
=============================================================================
Child SP         IP               Call Site
00007FD0816874F8 00007fd120d11390 [NativeStackFrame]
00007FD081806FA0 00007fd0abe00e74 00007FD081806FC0 (null) [Managed to Unmanaged transition: 00007FD081806FC0]
00007FD081807090 00007fd0abe00cfc [DEFAULT] [hasThis] ValueClass System.Drawing.SizeF System.Drawing.Graphics.MeasureString(String,Class System.Drawing.Font,ValueClass System.Drawing.SizeF,Class System.Drawing.StringFormat) (/app/MyTests/bin/Release/net5.0/runtimes/unix/lib/netcoreapp3.0/System.Drawing.Common.dll)
00007FD081807220 00007fd0abe009a6 [DEFAULT] [hasThis] ValueClass System.Drawing.SizeF System.Drawing.Graphics.MeasureString(String,Class System.Drawing.Font,ValueClass System.Drawing.SizeF) (/app/MyTests/bin/Release/net5.0/runtimes/unix/lib/netcoreapp3.0/System.Drawing.Common.dll)
00007FD081807250 00007fd0abe00946 [DEFAULT] [hasThis] ValueClass System.Drawing.SizeF System.Drawing.Graphics.MeasureString(String,Class System.Drawing.Font) (/app/MyTests/bin/Release/net5.0/runtimes/unix/lib/netcoreapp3.0/System.Drawing.Common.dll)
00007FD081807280 00007fd0abdfe473 [DEFAULT] [hasThis] Void MyTests.Tests2.TestMethod2() (/app/MyTests/bin/Release/net5.0/MyTests.dll)
00007FD0818072D0 00007fd1205c3f4b [NativeStackFrame]
00007FD081807540 (null) [Unmanaged to Managed transition: 00007FD081807540]
00007FD081807638 (null) [Internal call: 00007FD081807638]
00007FD0818077B0 00007fd0a6ce385c [DEFAULT] [hasThis] Object System.Reflection.RuntimeMethodInfo.Invoke(Object,ValueClass System.Reflection.BindingFlags,Class System.Reflection.Binder,SZArray Object,Class System.Globalization.CultureInfo) (/usr/share/dotnet/shared/Microsoft.NETCore.App/5.0.11/System.Private.CoreLib.dll)

This is where it happens, an unmanaged exception in the System.Drawing library. I have seen those crash tests few times before, and there is nothing we can do to prevent this in dotnet test.

nohwnd avatar Nov 04 '21 09:11 nohwnd

Nice - thanks @nohwnd. I will preserve the command line here for my own future reference of the flags... dotnet test --blame-crash --blame-crash-dump-type full --diag diag.log --configuration Release --no-build --no-restore -l "console;verbosity=detailed" --logger "trx;LogFileName=results.trx" MyTests

danmoseley avatar Nov 04 '21 14:11 danmoseley

@nohwnd Is there anything to do on our side or can we close this ticket?

Evangelink avatar Aug 03 '22 15:08 Evangelink

I don't think there is anything, System.Drawing was the root cause of such issues multiple times, we can't do anything about native code killing the process.

nohwnd avatar May 23 '23 09:05 nohwnd