sentry-dotnet icon indicating copy to clipboard operation
sentry-dotnet copied to clipboard

Windows builds are failing intermittently on CI

Open jamescrosswell opened this issue 9 months ago • 4 comments

See here, for an example. The error log shows:

Error: C:\Program Files\dotnet\packs\Microsoft.Android.Sdk.Windows\35.0.50\tools\Xamarin.Android.Common.targets(2038,3): error XA3007: Could not link native shared library: libxamarin-app.so [D:\a\sentry-dotnet\sentry-dotnet\test\AndroidTestApp\AndroidTestApp.csproj::TargetFramework=net9.0-android]
C:\Program Files\dotnet\packs\Microsoft.Android.Sdk.Windows\35.0.50\tools\Xamarin.Android.Common.targets(2038,3): error XA3007:  [D:\a\sentry-dotnet\sentry-dotnet\test\AndroidTestApp\AndroidTestApp.csproj::TargetFramework=net9.0-android]
Error: C:\Program Files\dotnet\packs\Microsoft.Android.Sdk.Windows\35.0.50\tools\Xamarin.Android.Common.targets(2038,3): error XA3007: stderr | ld: error: failed to write output 'obj\Release\net9.0-android\android-x64\app_shared_libraries\x86_64\libxamarin-app.so': permission denied [D:\a\sentry-dotnet\sentry-dotnet\test\AndroidTestApp\AndroidTestApp.csproj::TargetFramework=net9.0-android]

Which is coming from here in the Xamarin.Android.Common.targets.

@vaind commented in slack that this might be a concurrency issue:

Could not link native shared library: libxamarin-app.so And i've also got that on a clean build locally. I believe it's a concurrency issue, because running dotnet build again on the same project succeeded. (edited)

I can also reproduce this reliably on a clean build locally.

... now all we have to do is work out why 🙃

jamescrosswell avatar Apr 08 '25 02:04 jamescrosswell

Potentially we're running out of disk space on the Windows CI runners...

I've got about 600 GB free on the dev drive so that's definitely not the case for me locally.

vaind avatar Apr 27 '25 08:04 vaind

Yeah I can reproduce this locally sometimes when running a build after git clean -dfx.

I suspect it's related to this: https://github.com/getsentry/sentry-dotnet/blob/13d4a9f2038e142f9dc8c7799df06edfe17a2dd4/test/Sentry.Android.AssemblyReader.Tests/Sentry.Android.AssemblyReader.Tests.csproj#L19-L24

I'll put together a fix.

jamescrosswell avatar Apr 27 '25 23:04 jamescrosswell

We're still seeing this in CI - e.g. here: https://github.com/getsentry/sentry-dotnet/actions/runs/15009965767/job/42176826796

jamescrosswell avatar May 14 '25 02:05 jamescrosswell

Just hit this one too: https://github.com/getsentry/sentry-dotnet/actions/runs/15376855613/job/43262645227?pr=4236 Happened earlier this week and I just reran without a thought.

So while not extremely often, it's impactful enough of an issue

bruno-garcia avatar Jun 01 '25 16:06 bruno-garcia

Just hit this again:

Error: C:\Program Files\dotnet\packs\Microsoft.Android.Sdk.Windows\35.0.61\tools\Xamarin.Android.Common.targets(2038,3): error XA3007: Could not link native shared library: libxamarin-app.so [D:\a\sentry-dotnet\sentry-dotnet\test\AndroidTestApp\AndroidTestApp.csproj::TargetFramework=net9.0-android]
C:\Program Files\dotnet\packs\Microsoft.Android.Sdk.Windows\35.0.61\tools\Xamarin.Android.Common.targets(2038,3): error XA3007:  [D:\a\sentry-dotnet\sentry-dotnet\test\AndroidTestApp\AndroidTestApp.csproj::TargetFramework=net9.0-android]
Error: C:\Program Files\dotnet\packs\Microsoft.Android.Sdk.Windows\35.0.61\tools\Xamarin.Android.Common.targets(2038,3): error XA3007: stderr | ld: error: failed to write output 'obj\Release\net9.0-android\android-x64\app_shared_libraries\x86_64\libxamarin-app.so': permission denied [D:\a\sentry-dotnet\sentry-dotnet\test\AndroidTestApp\AndroidTestApp.csproj::TargetFramework=net9.0-android]

bruno-garcia avatar Jul 12 '25 16:07 bruno-garcia

Could it be because these libraries are somehow included multiple times?

  Sentry.Samples.Maui net9.0-android35.0 succeeded with 4 warning(s) (16,8s) → samples/Sentry.Samples.Maui/bin/Release/net9.0-android35.0/android-x64/Sentry.Samples.Maui.dll
    obj/Release/net9.0-android35.0/android-x64/lp/173/jl/jni/x86_64/libsentry-android.so : warning XA4301: APK already contains the item lib/x86_64/libsentry-android.so; ignoring.
    obj/Release/net9.0-android35.0/android-x64/lp/173/jl/jni/x86_64/libsentry.so : warning XA4301: APK already contains the item lib/x86_64/libsentry.so; ignoring.
    obj/Release/net9.0-android35.0/android-x64/lp/173/jl/jni/x86_64/libsentry-android.so : warning XA4301: APK already contains the item lib/x86_64/libsentry-android.so; ignoring.
    obj/Release/net9.0-android35.0/android-x64/lp/173/jl/jni/x86_64/libsentry.so : warning XA4301: APK already contains the item lib/x86_64/libsentry.so; ignoring.

With parallelized jobs on Windows, it results in a "permission denied" error when multiple jobs try to write the same files at the same time.

jpnurmi avatar Jul 16 '25 21:07 jpnurmi

Could it be because these libraries are somehow included multiple times?

I think those are just warnings related to the Sentry.Samples.Maui project... that project isn't what's causing the build errors though.

The error we're seeing is when building the AndroidTestApp... which happens in the Sentry.Android.AssemblyReader.Tests.csproj. The guts of the error message is this:

failed to write output 'obj\Release\net9.0-android\android-x64\app_shared_libraries\x86_64\libxamarin-app.so': permission denied

In theory, each of the builds should be output to a unique directory based on the target framework plus a config string: https://github.com/getsentry/sentry-dotnet/blob/a9a6811b87796bb700e21003b8b07a40d6e20773/test/Sentry.Android.AssemblyReader.Tests/Sentry.Android.AssemblyReader.Tests.csproj#L45-L50

But on Windows, apparently, even though the builds have a unique OutDir they're sharing the obj\Release\net9.0-android\android-x64\app_shared_libraries\x86_64\ directory at some point during the build.

If we could somehow force those builds to be run sequentially rather than in parallel, that might solve the issue...

jamescrosswell avatar Jul 16 '25 22:07 jamescrosswell

Perhaps BuildInParallel=false could help, but possibly even better with unique \obj dirs? AI-assisted proposal with BaseIntermediateOutputPath:

diff --git a/test/Sentry.Android.AssemblyReader.Tests/Sentry.Android.AssemblyReader.Tests.csproj b/test/Sentry.Android.AssemblyReader.Tests/Sentry.Android.AssemblyReader.Tests.csproj
index 0bbb2b5b..0f04dd12 100644
--- a/test/Sentry.Android.AssemblyReader.Tests/Sentry.Android.AssemblyReader.Tests.csproj
+++ b/test/Sentry.Android.AssemblyReader.Tests/Sentry.Android.AssemblyReader.Tests.csproj
@@ -47,7 +47,7 @@
       <SourceAPK>..\AndroidTestApp\bin\$(TargetFramework)\$(_ConfigString)\com.companyname.AndroidTestApp-Signed.apk</SourceAPK>
       <DestinationAPK>TestAPKs\$(TargetFramework)-$(_ConfigString).apk</DestinationAPK>
     </PropertyGroup>
-    <MSBuild Projects="..\AndroidTestApp\AndroidTestApp.csproj" Targets="Build" Properties="Configuration=Release;PublishAot=$(_Aot);_IsPublishing=true;RuntimeIdentifier=android-x64;AndroidUseAssemblyStore=$(_Store);AndroidEnableAssemblyCompression=$(_Compressed);OutDir=bin\$(TargetFramework)\$(_ConfigString)\" Condition="!Exists('$(DestinationAPK)')" />
+    <MSBuild Projects="..\AndroidTestApp\AndroidTestApp.csproj" Targets="Build" Properties="Configuration=Release;PublishAot=$(_Aot);_IsPublishing=true;RuntimeIdentifier=android-x64;AndroidUseAssemblyStore=$(_Store);AndroidEnableAssemblyCompression=$(_Compressed);OutDir=bin\$(TargetFramework)\$(_ConfigString)\;BaseIntermediateOutputPath=obj\$(TargetFramework)\$(_ConfigString)\" Condition="!Exists('$(DestinationAPK)')" />

     <Message Text="Copying APK from $(SourceAPK) to $(DestinationAPK)" Importance="high" />
     <Copy SourceFiles="$(SourceAPK)" DestinationFiles="$(DestinationAPK)" Condition="!Exists('$(DestinationAPK)')" />

jpnurmi avatar Jul 17 '25 06:07 jpnurmi

BaseIntermediateOutputPath

That sounds promising. Do you want to give it a go or should I?

jamescrosswell avatar Jul 17 '25 06:07 jamescrosswell

btw @jpnurmi from memory, this can be reproduced locally, on a Windows machine, by running git clean -dfx and then trying to build. The first build often fails with the same error we're seeing on CI. Subsequent builds then work.

It may not be 100% reliable but it happens often enough that you could probably verify: a) You see the behaviour we don't want prior to your fix b) Everything is rainbows and unicorns after the fix

jamescrosswell avatar Jul 17 '25 07:07 jamescrosswell

Pfft, linux-x64 didn't like it 😅

Error: /usr/share/dotnet/sdk/9.0.301/Sdks/Microsoft.NET.Sdk/targets/Microsoft.PackageDependencyResolution.targets(266,5): error NETSDK1004: Assets file '/home/runner/work/sentry-dotnet/sentry-dotnet/test/AndroidTestApp/obj/net9.0-android/A=False-S=False-C=False/project.assets.json' not found. Run a NuGet package restore to generate this file. [/home/runner/work/sentry-dotnet/sentry-dotnet/test/AndroidTestApp/AndroidTestApp.csproj::TargetFramework=net9.0-android]
Error: /usr/share/dotnet/sdk/9.0.301/Sdks/Microsoft.NET.Sdk/targets/Microsoft.PackageDependencyResolution.targets(266,5): error NETSDK1004: Assets file '/home/runner/work/sentry-dotnet/sentry-dotnet/test/AndroidTestApp/obj/net8.0-android/A=False-S=False-C=False/project.assets.json' not found. Run a NuGet package restore to generate this file. [/home/runner/work/sentry-dotnet/sentry-dotnet/test/AndroidTestApp/AndroidTestApp.csproj::TargetFramework=net8.0-android]
  • https://github.com/getsentry/sentry-dotnet/pull/4363

jpnurmi avatar Jul 17 '25 07:07 jpnurmi

Not ideal, but we are only seeing this issue on Windows... so potentially we could use different build arguments on Windows vs other platforms?

jamescrosswell avatar Jul 17 '25 07:07 jamescrosswell

btw @jpnurmi from memory, this can be reproduced locally, on a Windows machine, by running git clean -dfx and then trying to build. The first build often fails with the same error we're seeing on CI. Subsequent builds then work.

Thanks for the tip! Much faster than trying to repeat in the CI...

jpnurmi avatar Jul 17 '25 07:07 jpnurmi

I'm having difficulties reproducing the problem locally with the latest main branch and an old Surface Laptop 4 with quad-core i7-1185G7.

First, I ran git clean -xdf && dotnet build -c Release Sentry-CI-Build-Windows.slnf about 10 times in a row in the background while doing other stuff, and it succeeded every time.

Then, I tried putting various combinations and loads of CPU, memory, and disk pressure with HeavyLoad in hopes of emulating a busy CI environment. It only managed to make the builds slow, but still succeeded 5 times in a row.

Considering how often we've seen the failure in the CI lately, there should be a way to reproduce this. I'm running out of ideas... 🤔

jpnurmi avatar Jul 17 '25 10:07 jpnurmi

Hmm ... tricky ... perhaps now with #4358 (thanks @jpnurmi) we now have more diagnostics to understand the problem the next time it occurs.

Flash0ver avatar Jul 17 '25 18:07 Flash0ver

Did setting the BaseIntermediateOutputPath not work?

jamescrosswell avatar Jul 18 '25 06:07 jamescrosswell

I was not yet able to fix the build with (Base)IntermediateOutputPath but that direction seems worth exploring more. There are various issues with restore/javac etc. depending on how the intermediate output paths are configured.

jpnurmi avatar Jul 18 '25 08:07 jpnurmi

I think I got it working with MSBuild batching that allows disabling parallelization for conflicting builds:

  • https://github.com/getsentry/sentry-dotnet/pull/4363

A surprising side effect of building the AndroidTestApp variants one by one is that I had to skip PublishAot=True builds on Windows due to error: Cross-OS native compilation is not supported. Not sure how it can have passed the CI on Windows before.

Update: It was due to !$(TargetFramework.StartsWith('net8')) conditions accidentally going missing. 🤦‍♂️

jpnurmi avatar Jul 21 '25 21:07 jpnurmi

I think I got it working with MSBuild batching that allows disabling parallelization for conflicting builds

Ah cool... TIL!

Not sure how it can have passed the CI on Windows before.

Also not sure about this one. According to these docs, the builds should have failed on Windows... 😕

jamescrosswell avatar Jul 21 '25 21:07 jamescrosswell

The problem is still there :(

Error: C:\Program Files\dotnet\packs\Microsoft.Android.Sdk.Windows\35.0.61\tools\Xamarin.Android.Common.targets(2038,3): error XA3007: Could not link native shared library: libxamarin-app.so [D:\a\sentry-dotnet\sentry-dotnet\test\AndroidTestApp\AndroidTestApp.csproj::TargetFramework=net9.0-android]
C:\Program Files\dotnet\packs\Microsoft.Android.Sdk.Windows\35.0.61\tools\Xamarin.Android.Common.targets(2038,3): error XA3007:  [D:\a\sentry-dotnet\sentry-dotnet\test\AndroidTestApp\AndroidTestApp.csproj::TargetFramework=net9.0-android]
Error: C:\Program Files\dotnet\packs\Microsoft.Android.Sdk.Windows\35.0.61\tools\Xamarin.Android.Common.targets(2038,3): error XA3007: stderr | ld: error: failed to write output 'obj\Release\net9.0-android\android-x64\app_shared_libraries\x86_64\libxamarin-app.so': permission denied [D:\a\sentry-dotnet\sentry-dotnet\test\AndroidTestApp\AndroidTestApp.csproj::TargetFramework=net9.0-android]

https://github.com/getsentry/sentry-dotnet/actions/runs/16508654116/job/46685704446#step:17:2038

jpnurmi avatar Jul 25 '25 03:07 jpnurmi

The problem is still there :(

Frustrating... and only on Windows.

I wonder if it's related to the restore operation rather than the build.

Will test that theory in https://github.com/getsentry/sentry-dotnet/pull/4400

jamescrosswell avatar Jul 29 '25 23:07 jamescrosswell

Frustratingly, that last attempt at fixing it didn't work either. See: https://github.com/getsentry/sentry-dotnet/actions/runs/16734701092/job/47389203197?pr=4406

jamescrosswell avatar Aug 05 '25 06:08 jamescrosswell

-bl

Image

We were thinking about --no-restore in the build ... this this may have no actual effect in the build. Now we are thinking about skipping these tests on Windows.

Flash0ver avatar Aug 06 '25 09:08 Flash0ver

So the nuclear option, finally:

  • https://github.com/getsentry/sentry-dotnet/pull/4415

I don't think it matters too much. In practice, this code will only be executed on Android (never on Windows). We can and we do test it on macOS and Linux, which should be more than adequate.

jamescrosswell avatar Aug 06 '25 23:08 jamescrosswell