Proton icon indicating copy to clipboard operation
Proton copied to clipboard

Games crash on launch w/ NVIDIA and AMD drivers/hardware loaded/enabled at the same time.

Open jhnphm opened this issue 3 years ago • 14 comments

I'm using VFIO for the occasional incompatible windows game. All games seem to not complete startup w/ Proton 5.13+ (tried 7.x, experimental, etc) whenever the NVIDIA card is bound to the host. My main display is being run off of the AMD iGPU and I'm launching w/ prime-run steam. This problem manifests with or without prime-run though. If I unbind the NVIDIA card proton runs fine. Versions of proton < 5.13 also run fine.

Issue seems similar to https://github.com/ValveSoftware/Proton/issues/6180

I'm using Arch Linux, Ryzen 5700G, nVidia 3070

Logs attached:

slr-app837470-t20221007T121808.log steam-837470.log sysinfo.log

Console log:

/bin/sh\0-c\0PROTON_LOG=1 /home/john/.local/share/Steam/ubuntu12_32/reaper SteamLaunch AppId=837470 -- /home/john/.local/share/Steam/ubuntu12_32/steam-launch-wrapper -- '/home/john/.local/share/Steam/steamapps/common/SteamLinuxRuntime_soldier'/_v2-entry-point --verb=waitforexitandrun -- '/home/john/.local/share/Steam/steamapps/common/Proton 5.13'/proton waitforexitandrun  '/home/john/.local/share/Steam/steamapps/common/Untitled Goose Game/Untitled.exe'\0
Game process added : AppID 837470 "PROTON_LOG=1 /home/john/.local/share/Steam/ubuntu12_32/reaper SteamLaunch AppId=837470 -- /home/john/.local/share/Steam/ubuntu12_32/steam-launch-wrapper -- '/home/john/.local/share/Steam/steamapps/common/SteamLinuxRuntime_soldier'/_v2-entry-point --verb=waitforexitandrun -- '/home/john/.local/share/Steam/steamapps/common/Proton 5.13'/proton waitforexitandrun  '/home/john/.local/share/Steam/steamapps/common/Untitled Goose Game/Untitled.exe'", ProcID 14418, IP 0.0.0.0:0
chdir /home/john/.local/share/Steam/steamapps/common/Untitled Goose Game
ERROR: ld.so: object '/home/john/.local/share/Steam/ubuntu12_32/gameoverlayrenderer.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS32): ignored.
ERROR: ld.so: object '/home/john/.local/share/Steam/ubuntu12_64/gameoverlayrenderer.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS64): ignored.
GameAction [AppID 837470, ActionID 1] : LaunchApp changed task to WaitingGameWindow with ""
ERROR: ld.so: object '/home/john/.local/share/Steam/ubuntu12_32/gameoverlayrenderer.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS32): ignored.
ERROR: ld.so: object '/home/john/.local/share/Steam/ubuntu12_32/gameoverlayrenderer.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS32): ignored.
ERROR: ld.so: object '/home/john/.local/share/Steam/ubuntu12_32/gameoverlayrenderer.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS32): ignored.
ERROR: ld.so: object '/home/john/.local/share/Steam/ubuntu12_32/gameoverlayrenderer.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS32): ignored.
GameAction [AppID 837470, ActionID 1] : LaunchApp changed task to Completed with ""
ThreadGetProcessExitCode: no such process 14529
ThreadGetProcessExitCode: no such process 14527
ThreadGetProcessExitCode: no such process 14420
Game process updated : AppID 837470 "PROTON_LOG=1 /home/john/.local/share/Steam/ubuntu12_32/reaper SteamLaunch AppId=837470 -- /home/john/.local/share/Steam/ubuntu12_32/steam-launch-wrapper -- '/home/john/.local/share/Steam/steamapps/common/SteamLinuxRuntime_soldier'/_v2-entry-point --verb=waitforexitandrun -- '/home/john/.local/share/Steam/steamapps/common/Proton 5.13'/proton waitforexitandrun  '/home/john/.local/share/Steam/steamapps/common/Untitled Goose Game/Untitled.exe'", ProcID 14528, IP 0.0.0.0:0
Installing breakpad exception handler for appid(steam)/version(1665100899)
Installing breakpad exception handler for appid(steam)/version(1665100899)
Steam: An X Error occurred
X Error of failed request:  BadMatch (invalid parameter attributes)
Major opcode of failed request:  148
Serial number of failed request:  338
xerror_handler: X failed, continuing

jhnphm avatar Oct 07 '22 16:10 jhnphm

Hello @jhnphm, please copy your system information from Steam (Steam -> Help -> System Information) and put it in a gist, then include a link to the gist in this issue report.

kisak-valve avatar Oct 07 '22 16:10 kisak-valve

Hello @jhnphm, please copy your system information from Steam (Steam -> Help -> System Information) and put it in a gist, then include a link to the gist in this issue report.

I've copied it into the updated post above in sysinfo.log but also here: https://gist.github.com/jhnphm/f9e45d04d374cb9613386ac094b5e50a

jhnphm avatar Oct 07 '22 16:10 jhnphm

Thanks, AMDVLK has a history of breaking other Vulkan driver implementations. If you remove / disable AMDVLK and use mesa/RADV instead, are you able to reproduce this scenario?

kisak-valve avatar Oct 07 '22 16:10 kisak-valve

Yes.

(below log is running on Proton 7.x):

slr-app837470-t20221007T124252.log steam-837470.log

jhnphm avatar Oct 07 '22 16:10 jhnphm

12:42:52.860029: pressure-vessel-wrap[27962]: I: Vulkan ICD #0 at /usr/share/vulkan/icd.d/amd_icd32.json: /usr/lib32/amdvlk32.so AMDVLK is still in the mix in your test.

kisak-valve avatar Oct 07 '22 16:10 kisak-valve

Ah, left the 32-bit amdvlk in the mix. New test:

slr-app837470-t20221007T130801.log steam-837470.log

jhnphm avatar Oct 07 '22 17:10 jhnphm

For reference, this is a working run w/ the NVIDIA GPU unbound, run w/o prime-run: slr-app837470-t20221007T134748.log steam-837470.log

For apples to apples, nonworking run, NVIDIA GPU bound, w/o prime-run: slr-app837470-t20221007T135131.log steam-837470.log

A working NVIDIA GPU bound, w/o prime-run, on Proton 5.0:

steam-837470.log (couldn't find the steam runtime logfiles for some reason)

A working NVIDIA GPU bound, w/ prime-run, on Proton 5.0:

steam-837470.log

slr-app1420170-t20221007T135842.log

Basically combination of 5.13+ AND the NVIDIA GPU bound to the host but not necessarily active (doesn't make a difference whether prime-run is used or not) breaks.

jhnphm avatar Oct 07 '22 17:10 jhnphm

Actually, I'm not even able to launch winecfg in the prefix w/ the NVIDIA GPU bound:

john@thor [02:27:47 PM] [~] 
-> % export GAMEID=837470
john@thor [02:28:11 PM] [~] 
-> % WINEPREFIX=~/.steam/steam/steamapps/compatdata/$GAMEID/pfx/ WINEARCH=win64 .steam/steam/steamapps/common/Proton\ 7.0/dist/bin/wine64 'winecfg.exe'
wineserver: using server-side synchronization.
wine: RLIMIT_NICE is <= 20, unable to use setpriority safely
wine: Unhandled page fault on execute access to 00007F2D614EF3D0 at address 00007F2D614EF3D0 (thread 00cc), starting debugger...
00c4:err:winediag:nodrv_CreateWindow Application tried to create a window, but no driver could be loaded.
00c4:err:winediag:nodrv_CreateWindow The explorer process failed to start.
john@thor [02:28:15 PM] [~] 

jhnphm avatar Oct 07 '22 18:10 jhnphm

Installing vulkan-mesa-layers/lib32-vulkan-mesa-layers (https://bbs.archlinux.org/viewtopic.php?id=279672) helps running winecfg and untitled goose game directly w/ proton, but it still breaks if prime-run is enabled or if it's run through steam w/ the common error signature:

00c4:err:winediag:nodrv_CreateWindow Application tried to create a window, but no driver could be loaded.
00c4:err:winediag:nodrv_CreateWindow The explorer process failed to start.

Potentially related: https://www.reddit.com/r/linux_gaming/comments/rvzu5p/cant_run_winelutrisproton_apps_on_a_gpu_thats_not/ . It looks like I can get this to work, at least to start winecfg from the command line, if I bind the GPU before starting X, but that means I can no longer unbind it for passing it through to a VM w/o restarting X. Running untitled goose game from steam still doesn't work though.

Most other native applications like vkcube and Proton <= 5.0 work fine on the nVidia dGPU w/o Xorg started after binding to the GPU, so it does seem like a Proton/Wine regression.

jhnphm avatar Oct 07 '22 18:10 jhnphm

I can get prime-run to work w/ the scripts generated by using PROTON_DUMP_DEBUG_COMMANDS, if I switch to wayland, but I still can't get it to run via the steam GUI. Looks like bypassing the steam runtime with the arch steam-native script works too.

jhnphm avatar Oct 07 '22 22:10 jhnphm

This might be a Proton regression, but you said that Proton <= 5.0 is good and 5.13+ is bad, which suggests that one important factor might be whether you're using the SteamLinuxRuntime_soldier container runtime (which is used by Proton 5.13+, and optionally for native Linux games) or not (Proton <= 5.0 and most native Linux games).

However, there were also a lot of non-container-runtime-related changes between Proton 5.0 and 5.13, so it's also possible that this is genuinely a Proton problem and nothing to do with the container runtime.

Multi-GPU is complicated, Proton is complicated, and SteamLinuxRuntime is complicated, so the combination of the three gets very confusing. Please try to narrow down where the problem is, with as few complicated things involved as possible:

  • Get the overall system into the state where (some? all?) games are crashing on launch.
  • Get the Help -> System Information while in that state (this runs some simple diagnostic tools). The Gist you provided was before removing AMDVLK, so its results are not necessarily the same as what you're seeing now. If you alter the system state during testing (binding/unbinding the GPU, etc.), please get a new System Information dump matching each log, so that we can compare them.
  • Install a native Linux game that uses OpenGL. Counter-Strike: Global Offensive is free-to-play and actively maintained, and uses OpenGL by default. Floating Point is a much simpler free-to-play OpenGL game which can be useful to get a baseline for what a very simple scenario looks like.
  • Also install a native Linux game that uses Vulkan. CS:GO will use Vulkan if you set its launch options to %command% -vulkan, which makes it useful for apples-to-apples comparisons between OpenGL and Vulkan.
  • In the Properties of each of those games, go into the Compatibility tab, check Force the use of a specific Steam Play compatibility tool, and choose Steam Linux Runtime from the list. This will result in those games running in a SteamLinuxRuntime_soldier container (the same as Proton 5.13+) with some compatibility glue to provide the same libraries as the traditional scout Steam Runtime.
  • Try launching those games, and see whether they work or not.
  • If they work but Proton games do not, then this is probably a Proton problem.
  • If the native Linux games have the same issues in Steam Linux Runtime as the Proton games did, then this is probably a Steam Linux Runtime problem. To confirm, uncheck Force the use of a specific Steam Play compatibility tool for each game and try again.
  • If CS:GO works when the Launch Options are left blank but fails when they're set to %command% -vulkan, then this is a Vulkan-specific problem. Recent versions of Proton also use Vulkan when emulating most DirectX versions.

smcv avatar Oct 11 '22 14:10 smcv

A working NVIDIA GPU bound, w/o prime-run, on Proton 5.0: (couldn't find the steam runtime logfiles for some reason)

The SteamLinuxRuntime_soldier container runtime is not used for Proton 5.0, so it is correct and expected that you will not get a SteamLinuxRuntime_soldier/var/slr-*.log for Proton 5.0 games.

A working NVIDIA GPU bound, w/ prime-run, on Proton 5.0: steam-837470.log slr-app1420170-t20221007T135842.log

These logs don't match: if it was using Proton 5.0, then you wouldn't get a slr-*.log for that run. slr-app1420170-t20221007T135842.log seems to be an unrelated log from running Proton\ 5.13/proton run /home/john/.local/share/Steam/ubuntu12_32/../bin/d3ddriverquery64.exe (see the first line).

smcv avatar Oct 11 '22 14:10 smcv

-> % WINEPREFIX=~/.steam/steam/steamapps/compatdata/$GAMEID/pfx/ WINEARCH=win64 .steam/steam/steamapps/common/Proton\ 7.0/dist/bin/wine64 'winecfg.exe'

This is unsupported: Proton 5.13+ is intended to always be run in the SteamLinuxRuntime_soldier container environment, not on the host system. However, if this is also failing with the same symptoms as in the container runtime, then that suggests that the problem might be with Proton and not the container runtime.

Looks like bypassing the steam runtime with the arch steam-native script works too

This is also unsupported: the steam-for-linux binaries are intended to always be run with the (older, LD_LIBRARY_PATH-based) Steam Runtime, which is what steam-native disables. Scripts in the Steam Runtime are responsible for choosing whether to take each library from your host system or from the runtime (in most cases whichever one is newer must be used).

I'm surprised that steam-native has any effect on the container runtime - it only disables the older, LD_LIBRARY_PATH-based runtime mechanism (used by Steam itself, Proton <= 5.0 and most native Linux games) and shouldn't do anything to the container runtime. If steam-native vs. steam-runtime makes a difference, then there must be some relatively subtle interaction going on.

Are you sure you are running steam-native in exactly the same way that you were running Steam with the normal Steam Runtime enabled, so that the only difference is -native or not?

One thing that might be significant here is that if you run Steam from a desktop environment shortcut, most desktop environments will try to launch it on a discrete or non-default GPU using PRIME or similar (via PrefersNonDefaultGPU=true and X-KDE-RunOnDiscreteGpu=true), but if you run it from a command-line prompt, that will not take effect. So I wonder whether the difference might really be that you are running steam-native from a terminal (therefore on your default GPU), but running Steam in its normal supported mode from a desktop shortcut (therefore on your discrete GPU)?

smcv avatar Oct 11 '22 14:10 smcv

More recent sysinfo w/ amdvlk disabled: https://gist.github.com/jhnphm/535dc9ee4154fee34648c712fc357eab

CS:GO works natively both w/ OpenGL and w/ Vulkan, and w/ the runtime set to Steam Linux Runtime. so it seems to really be a Proton issue as opposed to a runtime issue.

The steam-native thing seems to be a red-herring, probably messed up some testing w/ GPU in a bad state or some other weird transient problem. I can get Steam running Proton games w/ the latest Proton normally w/ dGPU bound under Wayland though.

It might have something to do w/ binding the GPU after Xorg is started to keep Xorg from binding to it and making it un-unbindable for VMs w/o restarting the DE. [EDIT Nope, makes no difference].

Multi-GPU used to work on Xorg when I was using an AMD dGPU w/ an AMD iGPU, but the AMD card (Vega64) had other issues w/ VFIO that necessitated running Xorg instead of Wayland. I guess since it now all works under Wayland I can just use that since it works on Wayland, but if it's useful to chase this down I can provide more information.

Wayland sysinfo: https://gist.github.com/jhnphm/d378f7601301736401c72c684f6c6e3d

jhnphm avatar Oct 12 '22 14:10 jhnphm