fsharp Parallelize tests

Enable running xUnit tests in parallel.

To use xUnit means to customize it. Two customizations added:

Running collection and theory cases in parallel based on https://www.meziantou.net/parallelize-test-cases-execution-in-xunit.htm By default xUnit's unit of parallelization is test collection. Test cases in a collection run in sequence. Also, by default each class/module constitutes a collection. We have a lot of test cases in large modules and large theories that were bottlenecked by this. This customization enables parallelism in such cases. It can be reverted to default for a particular module with [<RunInSequence>] attribute.
Console streams captured universally and redirected to xUnit's output mechanism, which means you can just do printfn in a test case and it goes to the respective output. This can be inspected in the IDE and in case of failure is printed out when testing from the command line.

The default way in xUnit is to use ITestOutputHelper. This is very unwieldy, because it requires placing test cases in a class with a constructor, and then threading the injected output helper into any function that wants to output text. We have many tests in modules not classes, and many of the tests are using a lot of utility functions. Adjusting it all to use ITestOutputHelper is not feasible. OTOH just outputting with printfn is unobtrusive, natural and works well with interactive prototyping of test cases.

Some local run times:

dotnet test .\tests\FSharp.Compiler.ComponentTests\ -c Release -f net9.0

Test summary: total: 4489, failed: 0, succeeded: 4258, skipped: 231, duration: 199.0s

dotnet test .\tests\fsharp\ -c Release -f net9.0

Test summary: total: 579, failed: 0, succeeded: 579, skipped: 0, duration: 41.9s

dotnet test .\FSharp.sln -c Release  -f net9.0

Test summary: total: 12963, failed: 0, succeeded: 12694, skipped: 269, duration: 253.3s

Some considerations to make this work and keep it working To run tests in parallel we must deal with global resources and global state accessed by the test cases.

Out of proc: Tests running as separate processes are sharing the file system. We must make sure they execute in their own temporary directories and don't overwrite any hardcoded paths. This is already done, mostly in separate PR.

Hosted: Many tests use hosted compiler and FsiEvaluationSession, sharing global resources and global state within the runner process:

[x] Console streams - this is swept under a rug for now by using a simple AsyncLocal stream splitter.
[x] FileSystem global mutable of the file system shim - few tests that mutate it, must be excluded from parallelization.
[x] Environment.CurrentDirectory - many tests executing in hosted session were doing a variation of File.WriteAllText("test.ok", "ok") all in the current directory i.e. bin, leading to conflicts. This is replaced with a threadsafe mechanism.
[x] Environment variables, Path - mostly this applies to DependencyManager, excluded from parallelization for now.
[x] Async default cancellation token - few tests doing Async.CancelDefaultToken() must be excluded from parallelization.
[x] global state used in conjunction with --times option - tests excluded from parallelization.
[ ] global mutable state in the form of multiple caches implemented as ConcurrentDictionary. This needs further investigation.

I'll ad to the above list if I recall anything else.

Problems: Tests depending on tight timing, orchestrating stuff by combinations of Thread.Sleep, Async.Sleep and wait timeouts. These are mostly excluded from parallelization, some attempts at fixing things were made.

Obscure compiler bugs revealed in this PR:

Internal error: value cannot be null this mostly happens in coreClr, one time, sometimes a few times during the test run.
Error creating evaluation session because of NRE somewhere in TcImports.BuildNonFrameworkTcImports. This is more rare but may be related to the above.

These were related to some concurrency issues; modyfing frameworkTcImportsCache without lock and a bug in custom lazy implementation in il.fs. Hopefully both fixed now.

Running in parallel: Xunit runners are configured with mostly default parallelization settings.

dotnet test .\FSharp.sln -c Release -f net9.0 will run all discovered test assemblies in parallel as soon as they're built. This can be limited with the -m switch. For example, dotnet test -m:2 .\FSharp.Compiler.Service.sln will limit the test run to at most 2 simultaneous processes. Still, each test host process runs its test collections in parallel.

Some test collections are excluded form parallelization with [<Collection(nameof DoNotRunInParallel)>] attribute.

Running in the IDE with "Run tests in parallel" enabled will respect xunit.runner.json settings and the above exclusions.

TODO:

[x] Make sure this keeps working properly with BUILDING_USING_DOTNET scenario (#14677)

Oct 11 '24 11:10 majocha

:warning: Release notes required, but author opted out

[!WARNING] Author opted out of release notes, check is disabled for this pull request. cc @dotnet/fsharp-team-msft

Oct 11 '24 11:10 github-actions[bot]

Thanks for your endurance, Jakub 💪

Oct 15 '24 13:10 psfinaki

@psfinaki I will need some help with this Source-Build error: https://dev.azure.com/dnceng-public/public/_build/results?buildId=847523&view=logs&j=2f0d093c-1064-5c86-fc5b-b7b1eca8e66a&t=52d0a7a6-39c9-5fa2-86e8-78f84e98a3a2&l=45

At this moment this is very stable locally but will also probably need testing on other machines than mine :)

What's left to do is to tune this for stability in CI. I've been trying different things and timing runs. The most glaring problem is the testDesktop. In CI desktop runs both FSharpSuite and ComponentTests take around 40 minutes each. I guess slicing the test suite and running with multi-agent parallel strategy would improve things here. I added some simple provisions for easier slicing using traits: --filter ExecutionNode=n will now take a stable slice of the test suite (currently n is hardcoded 1..4)

I also noticed Linux run is constantly low on memory, this is unrelated as it happens on main, too. For this reason I set MaxParallelThreads=4 in build.sh to cool things down a bit.

Oct 18 '24 17:10 majocha

@majocha the error is weird, nothing comes to my mind right away. Let's rebase and rerun and see if it's still happening... Sorry, I know this is somewhat lame, it's just that SourceBuild is a Linux thing and it's not trivial to debug its issues locally.

As for cooling things down - I also noticed this today, thanks for addressing this.

What else do you think we can split from this PR into some separate ones?

Oct 21 '24 10:10 psfinaki

What else do you think we can split from this PR into some separate ones?

There are some small further fixes, maybe also the whole console handling does not really depend on parallel execution.

Somewhat related thing I have on my mind recently is to implement a FileSystem shim for tests that will be as much in-memory as possible and isolated per testcase. It wouldn't handle the tests that start separate processes, though.

Oct 21 '24 11:10 majocha

@majocha the error is weird, nothing comes to my mind right away. Let's rebase and rerun and see if it's still happening... Sorry, I know this is somewhat lame, it's just that SourceBuild is a Linux thing and it's not trivial to debug its issues locally.

Thanks! Rebasing did help.

Oct 21 '24 14:10 majocha

There are some small further fixes, maybe also the whole console handling does not really depend on parallel execution.

Yeah console handling would be probably good to isolate if possible.

Somewhat related thing I have on my mind recently is to implement a FileSystem shim for tests that will be as much in-memory as possible and isolated per testcase. It wouldn't handle the tests that start separate processes, though.

Just for my understanding, what would this add on top of the current results the PR achieves?

Oct 21 '24 14:10 psfinaki

Just for my understanding, what would this add on top of the current results the PR achieves?

This would be an experiment for another PR, but basically, I don't like all that copying to temp dirs that I added in recent PRs. A FileSystem shim just for testing, that virtualizes all of the writes and keeps track of which test case wrote what, to correctly isolate them, would be maybe possibly more performant and a cleaner solution.

Oct 21 '24 15:10 majocha

Right, yeah, I see. No it's worth playing with, although given that we don't touch these tests too much, it's probably worth seriously investing into only if it starts yielding reasonable performance fruits.

Oct 21 '24 17:10 psfinaki

https://dev.azure.com/dnceng-public/public/_build/results?buildId=863457&view=logs&j=7bab896a-24f8-544f-51eb-43745367a332&t=999dbed9-85e3-59ab-57f0-3e22828b5bad&l=3823 I've seen this test hang also locally, but it takes like few hundreds of iterations to trigger the deadlock (?).

Nov 07 '24 14:11 majocha

TODO, two test cases have poorer stability now: error on one workflow should cancel all others throws Test project1 and make sure TcImports gets cleaned up timeout (blame-hang-timeout turned out to be helpful) Needs fix or revert to original.

Nov 07 '24 16:11 majocha

😞 this is new. https://dev.azure.com/dnceng-public/public/_build/results?buildId=867529&view=logs&j=e2e0a842-382a-5483-7008-fc809f5eff8d&t=6a49776c-db36-5de0-b127-a6a0074bf26a&l=14859 Fatal error. Internal CLR error. (0x80131506)

https://dev.azure.com/dnceng-public/public/_build/results?buildId=867536&view=logs&j=2363a781-3528-55f4-1450-adabad4eb250&t=4ee64dfd-475b-5640-3e58-0c1d0f6688da&l=20 D:\a\_work\1\s\tests\EndToEndBuildTests\BasicProvider\BasicProvider\BasicProvider.fsproj : error MSB4057: The target "_GetRestoreSettingsPerFramework" does not exist in the project. [TargetFramework=net9.0] This also started today.

Nov 12 '24 12:11 majocha

/azp run

Nov 12 '24 14:11 psfinaki

Azure Pipelines successfully started running 2 pipeline(s).

Nov 12 '24 14:11 azure-pipelines[bot]

Most of the Utilities refactoring and the parallel console support is extracted to #17993. This excludes the xUnit customizations and parallelization of the tests. I hope to make sure this still works and tests fine with the state of the things we have in main.

Nov 12 '24 14:11 majocha

Awesome, thanks for splitting that - we will take a look there shortly :)

Nov 12 '24 15:11 psfinaki

I don't know what's up with the _GetRestoreSettingsPerFramework error, though.

Nov 12 '24 15:11 majocha

I renamed the parallelization disabling collection to NotThreadSafeResourceCollection. This is maybe more general and descriptive at the same time. Also it's literally in the recommendation here : https://github.com/xunit/xunit/issues/1999#issuecomment-522635397

Another thing is Service Tests were deparallelized in code and I missed it for all this time: https://github.com/dotnet/fsharp/blob/fb69e5870ee6d04e64858d281d41a7090ff28dd1/tests/FSharp.Compiler.Service.Tests/AssemblyInfo.fs#L1-L7

So this is now unblocked, should shave another minute or half. If the CI can't manage this, I'll throttle it back.

Nov 14 '24 10:11 majocha

MacOS seems to crash the test host now... in the current state of things, are the logs properly blameable now? We can use this as an exercise :)

Nov 14 '24 14:11 psfinaki

https://dev.azure.com/dnceng-public/public/_build/results?buildId=869817&view=logs&j=7bab896a-24f8-544f-51eb-43745367a332&t=999dbed9-85e3-59ab-57f0-3e22828b5bad&l=3815

MacOS seems to crash the test host now... in the current state of things, are the logs properly blameable now? We can use this as an exercise :)

Yes, blame-hang-timeout works as intended: https://dev.azure.com/dnceng-public/public/_build/results?buildId=869817&view=logs&j=7bab896a-24f8-544f-51eb-43745367a332&t=999dbed9-85e3-59ab-57f0-3e22828b5bad&l=3815

I'll exclude this GC test.

Edit: example blame-hang-timeout output:

Data collector 'Blame' message: The specified inactivity time of 5 minutes has elapsed. Collecting hang dumps from testhost and its child processes.
Results File: /Users/runner/work/1/s/artifacts/TestResults/Release/FSharp.Compiler.Service.Tests_net9.0.xml

Passed!  - Failed:     0, Passed:  1668, Skipped:    23, Total:  1691, Duration: 58 s - FSharp.Compiler.Service.Tests.dll (net9.0)
Test Run Aborted.

The active Test Run was aborted because the host process exited unexpectedly. Please inspect the call stack above, if available, to get more information about where the exception originated from.
The test running when the crash occurred: 
FSharp.Compiler.Service.Tests.ProjectAnalysisTests.Test project1 and make sure TcImports gets cleaned up

This test may, or may not be the source of the crash.

Nov 14 '24 14:11 majocha

Standard output is now printed to the console only in case of test failure: https://dev.azure.com/dnceng-public/public/_build/results?buildId=870812&view=logs&j=22fea640-1099-5f32-ec5d-316ad83f359a&t=4607427e-2085-572d-4b25-975ef0413f8f&l=3814 example:

  Failed FSharp.Compiler.Service.Tests.ProjectAnalysisTests.References from #r nuget are included in script project options [11 s]
  Error Message:
   Assert.Contains() Failure: Item not found in collection
Collection: ["System.Runtime.dll", "FSharp.Core.dll", "System.ComponentModel.TypeConverter.dll", "System.Diagnostics.Debug.dll", "System.Security.Cryptography.X509Certificates.dll", ···]
Not found:  "Dapper.dll"
  Stack Trace:
     at FSharp.Compiler.Service.Tests.ProjectAnalysisTests.References from #r nuget are included in script project options() in /home/vsts/work/1/s/tests/FSharp.Compiler.Service.Tests/ProjectAnalysisTests.fs:line 5806
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)
  Standard Output Messages:
 System.Runtime.dll
 FSharp.Core.dll
 ...

This particular test case needs to be deparallelized possibly because Checker's scriptClosureCache is not thread-safe (?). Each such case is an opportunity to investigate and improve things.

Nov 15 '24 12:11 majocha

Standard output is now printed to the console only in case of test failure: https://dev.azure.com/dnceng-public/public/_build/results?buildId=870812&view=logs&j=22fea640-1099-5f32-ec5d-316ad83f359a&t=4607427e-2085-572d-4b25-975ef0413f8f&l=3814 example:

That's really cool I think.

This particular test case needs to be deparallelized possibly because Checker's scriptClosureCache is not thread-safe (?). Each such case is an opportunity to investigate and improve things.

One way for you to not go crazy because of our tests is to collect such tests in some GitHub issue and we might promote them as good first issues :)

Nov 18 '24 11:11 psfinaki

Standard output is now printed to the console only in case of test failure: https://dev.azure.com/dnceng-public/public/_build/results?buildId=870812&view=logs&j=22fea640-1099-5f32-ec5d-316ad83f359a&t=4607427e-2085-572d-4b25-975ef0413f8f&l=3814 example:

That's really cool I think.

This particular test case needs to be deparallelized possibly because Checker's scriptClosureCache is not thread-safe (?). Each such case is an opportunity to investigate and improve things.

One way for you to not go crazy because of our tests is to collect such tests in some GitHub issue and we might promote them as good first issues :)

😄 Yes, thats pretty much everything marked with NotThreadSafeResourceCollection in this PR.

Nov 18 '24 11:11 majocha

Not the first time I'm seeing this with .Net 9.0: Fatal error. Internal CLR error. (0x80131506)

Nov 19 '24 08:11 majocha

Do you have a hunch on if it's the same test producing this?

Nov 19 '24 12:11 psfinaki

Do you have a hunch on if it's the same test producing this?

I feel this is GC related, so no particular test. Let's make a note if it shows up again. This time it was:

The test running when the crash occurred: 
Conformance.BasicGrammarElements.CustomAttributes_AttributeUsage.E_ConditionalAttribute_fs

the previous time:

Conformance.Types.UnionStructTypes.Custom ValueOption keeps working

Nov 19 '24 14:11 majocha

Do you have a hunch on if it's the same test producing this?

This only happens in testCoreClr and only in CI so I'd just relax its parallelization in the .ps1 script even more. It's way faster than testDesktop anyway, so it won't slow things down overall.

I don't want to compromise the local dev experience because of it. dotnet test runs full speed so there is no problem here, but if someone runs Build.ps1 with -testCoreClr it would be better to not throttle it locally, I think.

Nov 20 '24 11:11 majocha

https://github.com/dotnet/fsharp/pull/18065 this should fix dumps collection (I hope).

Nov 26 '24 15:11 vzarytovskii

Do you have a hunch on if it's the same test producing this?

fyi, it's not a test which producing this, it's a runtime issue (gc hole? jit issue? generic ave?).

Nov 26 '24 15:11 vzarytovskii

There it is again: https://dev.azure.com/dnceng-public/public/_build/results?buildId=879847&view=logs&j=170942dc-5349-5022-2275-77744f335216&t=20776a93-33c2-52f0-ee8f-b8c9809b2e23&l=13071

Nov 26 '24 17:11 majocha