Race in GetProjectOptionsFromScript prevents getting referenced packages
Repro steps:
- call
GetProjectOptionsFromScript - start the resulting async as task and pass a cancellation token
- request cancellation on the token source
- call
GetProjectOptionsFromScriptagain (probably from a different thread) - start the task with a new cancellation token
The GetProjectOptionsFromScript calls are made using lock application to prevent concurrent access.
Calling and cancelling it like this may lead to situation where two scrips closures are calculated concurrently, despite one of them has already been cancelled by the callee. Here're two screenshots for different threads, both made while staying on the same breakpoint:
This leads to a situation where both threads try to write a generated file during the package restore. If the file is locked then the restore fails and results with missing package references are added to cache inside DependencyProvider, so all later GetProjectOptionsFromScript invocations return failed results for the same compiler directive collection.
The build failure details
The cached script closure:

Exception stacktrace:

I don't quite understand the likely nature of the bug from the above
Calling and cancelling it like this may lead to situation where two scrips closures are calculated concurrently, despite one of them has already been cancelled by the callee.
When you say "calculated concurrently" could you indicate which routine or code you think is being executed concurrently for the same script?
Also when you say "has been cancelled" do you mean that the cancellation hasn't been effective?
Or is it more that a bad failing result (related to cancellation) has been written to the cache?
When you say "calculated concurrently" could you indicate which routine or code you think is being executed concurrently for the same script?
@dsyme Yes, please check the stacks on the attached screenshots. 🙂
Also when you say "has been cancelled" do you mean that the cancellation hasn't been effective?
I think this is what is happening indeed, the cancellation tokens are also captured on the screenshots.
Or is it more that a bad failing result (related to cancellation) has been written to the cache?
The very last screenshot shows a cached failure to write to a locked file inside an MSBuild task.
Looks like FSharpDependencyManager is allowing multiple concurrent callers to attempt resolutions within its working directory at the same time
@KevinRansom Could you take a look at this? I think FSharpDependencyManager needs to be made concurrency safe - not its data structures, but rather its use of the working directory as a dedicated non-concurrent resource.
I shall take a look. For sure there is no real reason beyond debuggability that we reuse the working directory. Although, when trying to figure out what is going on, that is my go to.
I suppose a risk of using separate directories (ie fixing this bug) is that we will end up spawning hundreds of resolutions as the programmer is editing the references of a script, each typing of a character doing a new spawn. Right now the bug is probably gating the process of doing resolutions in some way.
Equally we've gradually moved away from have FSharp.Compiler.Service manage resources and act as a mini-operating-system (e.g. by removing the reactor thread), but it's something to be aware of and watch for.
I suppose a risk of using separate directories (ie fixing this bug) is that we will end up spawning hundreds of resolutions as the programmer is editing the references of a script, each typing of a character doing a new spawn.
It should only create new directories/files for unique dependency provider keys, so it won't be happening for each keystroke, unless changing the reference directives: https://github.com/dotnet/fsharp/blob/1490dd5d8abfc5fe6609e56a174914cf72306179/src/fsharp/DependencyManager/DependencyProvider.fs#L425
And when changing the directives, it currently may reuse the files, leading to hard-to-diagnose race issues.