How to run multiple parallel SandboxProcesses
I'm trying to run multiple MSBuild processes in parallel via SandboxedProcessFactory. Each process is given a different FileAccessManifest to block access to certain directories (which vary from process to process). I'm finding that the blocking is inconsistent such that it looks like the processes are stepping on each other. I get random blocks pretty reliably on some portion of the processes (always changing), but if I retry with the same manifests, they succeed. One other observation is that if I run the processes serially, everything works without issue.
My sandboxed process info looks like this:
var info = new SandboxedProcessInfo(
fileAccessManifest.PathTable,
null,
msbuildPath,
disableConHostSharing: true
loggingContext: new LoggingContext(nameof(BuildXLMSBuildInvoker)))
{
Arguments = arguments,
PipDescription = "Run MSBuild",
};
Is there a way to isolate the sandboxed processes from each other?
The SandboxedProcessFactory is safe to use in a multi-process context. This factory is used by the build engine as well when executing concurrent processes. There is no interaction between sandboxes that I'm aware of. There could be interactions between running processes though when accessing shared files, that could result in different behavior when running concurrently. In order to get additional data, my suggestion is to hook up a detours listener (one of the optional arguments of the SandboxedProcessInfo) so you can get every single access that each sandbox detected, and whether it was allowed or not.
Thanks for the quick response!
I tried adding an IDetoursEventListener but for some reason I don't get all the files I would expect. The paths seem to all be msbuild.exe or csc.exe. Maybe it's only capturing events from the first level and not child processes?
When I use ReportFileAccesses = true, I can see after the process finishes that paths (always .csproj files) were Denied where they should not be based on the manifest.
One thing I'm still hazy on is the mask parameter when adding a scope. I'm using MaskNothing for the initial scope (giving AllowReadIfNonexistent to everything). Then I use MaskAll for the remaining scopes where I allow read/deny access to various special folders & source folders or all access to output folders. Is that how I should be doing this?
Maybe it's only capturing events from the first level and not child processes?
The sandbox captures events from all the process tree unless you specify otherwise in the manifest (there is a MonitorChildProcesses flag, which is true by default). Wondering if because you are getting access to csprojs denied, there are actually no other files being read just because msbuild does not make further progress. One possible experiment is to set FailUnexpectedFileAccesses in the manifest to false, so you'll get reports about allowed/denied, but the actual OS call will be allowed, and tools should be able to make progress as usual, regardless of the manifest mask settings. The other thing that comes to mind is that you may not be setting the Report option appropriately at the manifest level. See below for more details about this last thing.
One thing I'm still hazy on is the mask parameter when adding a scope. I'm using MaskNothing for the initial scope (giving AllowReadIfNonexistent to everything). Then I use MaskAll for the remaining scopes where I allow read/deny access to various special folders & source folders or all access to output folders. Is that how I should be doing this?
You can think the mask as the way to specify how to treat the access policy for parent scopes (so they get ANDed with the parent scope mask) and policies what you are adding to the current scope (they get ORed with the parent scope policy). You can check how the engine sets that for regular process execution. The root scope is set here and then depending on the type of construct specified on the engine, it sets child scopes based on those. I'm thinking that you can maybe follow these pointers and extrapolate the masks and values you need for your specific case.
I'm getting very weird behavior. I set FailUnexpectedFileAccesses = false and ReportFileAccesses = true but I'm still seeing random denied access (e.g. MSB4025 The project file could not be loaded. Access to the path '...' is denied.). I simplified the manifest and made it look very similar to the block files example project. I allow all by default (Invalid, MaskNothing, AllowAll) and then block some source directories (C:\Dev...\src..., MaskAll, Deny & ReportAccess). I don't get blocks when I use a version of my application that does not include sandboxing.
Some sanity checks...
- I'm using the LKG version binaries 0.1.0-20250222.1 and rolled a custom nupkg with just the handful of binaries that were needed to implement the sandboxing. Is it possible I'm missing a binary that's silently being ignored? My nupkg includes Processes/Native/Utilities.Core/RuntimeContracts and the BuildXLNatives.dll and DetoursServices.dll from x64 (these last 2 are just copied to the output directory's x64 directory).
- Could Windows Defender/anti-virus be interfering with the detours?
- Just to be clear, I have a single parent process that is invoking multiple MSBuild processes via sandboxing. It wasn't clear whether your multi-process case was multiple sandboxed processes running on the same host via different parent processes or a single parent process. Is having one parent process monitor multiple sandboxed processes supported?
I'm using the LKG version binaries 0.1.0-20250222.1 and rolled a custom nupkg with just the handful of binaries that were needed to implement the sandboxing. Is it possible I'm missing a binary that's silently being ignored? My nupkg includes Processes/Native/Utilities.Core/RuntimeContracts and the BuildXLNatives.dll and DetoursServices.dll from x64 (these last 2 are just copied to the output directory's x64 directory).
I doubt that this is about missing binaries. I'd expect things to explode in a bad way in that case
Could Windows Defender/anti-virus be interfering with the detours?
We haven't hit any problems with Defender interacting with detours that I'm aware of (besides slow downs)
Is having one parent process monitor multiple sandboxed processes supported?
Yes, that should work fine. BuildXL does essentially the same when wrapping every execution required by a build in a sandbox. The only scenario I'm aware of that doesn't really work is if you try to nest sandboxes (that is, a process running in a sandbox tries to set up a sandbox).
Maybe you can share some of the code you are trying to run/a small repro? I can take a look and see whether I can spot what's going on
I haven't had much luck creating a repro. I tried creating 50 fake csproj files each in a separate directory and then running msbuild against each in parallel while blocking access to the other 49 directories. Worked without issue.
I was able to reproduce the problem using my application against a smaller codebase. I had it output the policy for the one that failed:
[] Cone Policy:271 (Read|Write|ReadIfNonexistent|CreateDirectory|CreateSymlink) Node Policy:271 (Read|Write|ReadIfNonexistent|CreateDirectory|CreateSymlink) '\'
[C:] <Scope> PathID:0x10000001 Cone Policy:271 (Read|Write|ReadIfNonexistent|CreateDirectory|CreateSymlink) Node Policy:271 (Read|Write|ReadIfNonexistent|CreateDirectory|CreateSymlink) {Root Scope}
[DEV] <Scope> PathID:0x10000002 Cone Policy:271 (Read|Write|ReadIfNonexistent|CreateDirectory|CreateSymlink) Node Policy:271 (Read|Write|ReadIfNonexistent|CreateDirectory|CreateSymlink) {Root Scope}
[FRAMEWORK-TEST] <Scope> PathID:0x10000003 Cone Policy:271 (Read|Write|ReadIfNonexistent|CreateDirectory|CreateSymlink) Node Policy:271 (Read|Write|ReadIfNonexistent|CreateDirectory|CreateSymlink) {Root Scope}
[SRC] <Scope> PathID:0x10000004 Cone Policy:271 (Read|Write|ReadIfNonexistent|CreateDirectory|CreateSymlink) Node Policy:271 (Read|Write|ReadIfNonexistent|CreateDirectory|CreateSymlink) {Root Scope}
[UTIL] <Scope> PathID:0x10000005 Cone Policy:271 (Read|Write|ReadIfNonexistent|CreateDirectory|CreateSymlink) Node Policy:271 (Read|Write|ReadIfNonexistent|CreateDirectory|CreateSymlink) {Root Scope}
[BLAZOR.TESTS] <Path> PathID:0x10000015 Cone Policy:0 ((0, Deny)) Node Policy:0 ((0, Deny)) {Root Scope}
[SQLSERVER.TESTS] <Path> PathID:0x10000012 Cone Policy:0 ((0, Deny)) Node Policy:0 ((0, Deny)) {Root Scope}
[BUILD] <Scope> PathID:0x10000009 Cone Policy:0 ((0, Deny)) Node Policy:0 ((0, Deny)) {Root Scope}
[BLAZOR] <Scope> PathID:0x10000014 Cone Policy:0 ((0, Deny)) Node Policy:0 ((0, Deny)) {Root Scope}
[BUILD.TESTS] <Path> PathID:0x1000000A Cone Policy:0 ((0, Deny)) Node Policy:0 ((0, Deny)) {Root Scope}
[PRIMITIVES.TESTS] <Path> PathID:0x1000000F Cone Policy:0 ((0, Deny)) Node Policy:0 ((0, Deny)) {Root Scope}
[FRAMEWORK.TESTS] <Path> PathID:0x1000000C Cone Policy:0 ((0, Deny)) Node Policy:0 ((0, Deny)) {Root Scope}
[SQLSERVER] <Scope> PathID:0x10000011 Cone Policy:0 ((0, Deny)) Node Policy:0 ((0, Deny)) {Root Scope}
[BLAZOR.TESTSUPPORT] <Path> PathID:0x10000016 Cone Policy:0 ((0, Deny)) Node Policy:0 ((0, Deny)) {Root Scope}
[SQLSERVER.TESTSUPPORT] <Path> PathID:0x10000013 Cone Policy:0 ((0, Deny)) Node Policy:0 ((0, Deny)) {Root Scope}
[FRAMEWORK] <Scope> PathID:0x1000000B Cone Policy:0 ((0, Deny)) Node Policy:0 ((0, Deny)) {Root Scope}
[FRAMEWORK.TESTSUPPORT] <Path> PathID:0x1000000D Cone Policy:0 ((0, Deny)) Node Policy:0 ((0, Deny)) {Root Scope}
The error I got was:
MSBUILD : error MSB4166: Child node "6" exited prematurely. Shutting down. Diagnostic information may be found in files
in "C:\Users\mwelsh\AppData\Local\Temp\MSBuildTemp\" and will be named MSBuild_*.failure.txt. This location can be cha
nged by setting the MSBUILDDEBUGPATH environment variable to a different directory.
The temp file contained:
UNHANDLED EXCEPTIONS FROM PROCESS 113228:
=====================
3/14/2025 1:53:19 PM
Microsoft.Build.Framework.InternalErrorException: MSB0001: Internal MSBuild Error: Node 7 does not have a provider.
at Microsoft.Build.CommandLine.MSBuildApp.BuildProject(String projectFile, String[] targets, String toolsVersion, Dictionary`2 globalProperties, Dictionary`2 restoreProperties, ILogger[] loggers, LoggerVerbosity verbosity, DistributedLoggerRecord[] distributedLoggerRecords, Boolean needToValidateProject, String schemaFile, Int32 cpuCount, Boolean enableNodeReuse, TextWriter preprocessWriter, TextWriter targetsWriter, Boolean detailedSummary, ISet`1 warningsAsErrors, ISet`1 warningsNotAsErrors, ISet`1 warningsAsMessages, Boolean enableRestore, ProfilerLogger profilerLogger, Boolean enableProfiler, Boolean interactive, ProjectIsolationMode isolateProjects, GraphBuildOptions graphBuildOptions, Boolean lowPriority, Boolean question, Boolean isBuildCheckEnabled, String[] inputResultsCaches, String outputResultsCache, Boolean saveProjectResult, BuildResult& result, Boolean reportFileAccesses, String commandLine)
at Microsoft.Build.CommandLine.MSBuildApp.Execute(String commandLine)
at Microsoft.Build.CommandLine.MSBuildApp.Main()
===================
Does this help at all?
I haven't had much luck creating a repro. I tried creating 50 fake csproj files each in a separate directory and then running msbuild against each in parallel while blocking access to the other 49 directories. Worked without issue.
Sorry, didn't get this part. You blocked 49 directories via configuring detours and allowed the remaining one. And that worked fine? What is the scenario that doesn't work then?
The policy that you are printing out does seem to be denying accesses for these .TEST* files. But if you run with FailUnexpectedFileAccesses = false those denied accesses will only be reported back, but running tools won't be denied access to those paths. So in that case the sandbox shouldn't get in the way of MSBuild.
Sorry, I was not very precise... I've been trying to create a simple repro that I could share with you. That was the 50 parallel MSBuild calls, which, unfortunately, did not reproduce the problem.
My real application is a fairly complex build engine which compiles a roughly ~10 million line codebase. That's where I'm seeing the problems when I introduce sandboxing. For some reason setting FailUnexpectedFileAccesses = false still seems to result in file blocks (or so I assume based on the MSBuild crash).
I can't share the real application (or the codebase), but I'll keep trying to strip away extraneous parts to see if I can narrow down the cause and hopefully mirror that in a simple repro.
I think I found the problem -- node-reuse. I think I'm getting weird inconsistent behavior because some of the file accesses are happening on shared MSBuild server processes. I'm guessing that whatever file manifest was active when the server process was started is dictating what it can/cannot access.
I added /nr:false and so far it's working as I would expect. The one thing I'm not sure about is: if I run my application with sandboxing and /nr:false but on the same box as Visual Studio (which will still enable node-reuse), does that still guarantee that the sandboxed MSBuild won't reuse a shared MSBuild from Visual Studio? Or does the /nr:false only suppress new shared MSBuild processes (but possibly use a shared MSBuild that is already running)?
Good info. I can see MSBuild node reuse being an issue. I don't think that explains the access denied issues though. The sandbox should only block when is instructed to block. But assuming some access blocking was happening, I can see node reuse making this behavior non-deterministic or harder to understand in terms of provenance.
If there is an existing MSBuild process running that your application just happens to find and connect to, then that can definitively escape the sandbox. But I'm not super familiar with the details of /nr when it comes to MSBuild and whether that is actually possible.
Is having one parent process monitor multiple sandboxed processes supported?
That's what BuildXL does. For every process that needs to be launched during a build session, BuildXL wraps it inside our sandboxed process so that we can monitor file accesses. These reported file accesses are essential for caching.
node-reuse
Yes, using node reuse is a bit problematic. We had the same experience with other build engines that try to run multiple MSBuild processes through our sandboxed processes.
In short, you need to disable MSBuild node reuse.
Setting the environment variable MSBUILDDISABLENODEREUSE to 1 is the best effort because users can always specify /nr:true or /nodeReuse:true in the MsBuild command line. We sometimes had to rewrite the user command line to ensure that node-reuse is disabled.
If node-reuse is not disabled, then, even when MsBuild process itself is detoured, it can still use any lingering undetoured nodes to execute the build. Executions that go to those undetoured nodes become unobservable, i.e., you won't get reported file accesses.
On the other hand, if it goes to a detoured node that uses a different file access manifest, you may get unexpected file access behavior, like access denied.
Hope this helps.
I was never able to reproduce the errors in a sample application (only with my full application & codebase). Even with MSBUILDDISABLENODEREUSE=1, I was still getting unreliable behavior in my real case. I'm going to give up on running sandboxed msbuild in parallel on a workstation.
When I use sandboxing in our CI server environment with each process running msbuild once (and nodeReuse:false), it works correctly and blocks access only where I tell it to. That should be good enough. We can live with catching problems at CI time as opposed to while developing on a workstation.