Evaluating DLSS on compute queue using Streamline fails due to Streamline transitioning resources into graphics-only states
Steps to reproduce: try to 'slEvaluateFeature' DLSS through Streamline on the compute queue.
Description:
Since DLSS does (to my knowledge) not use graphics hardware, it should be entirely possible to run DLSS on the async compute queue.
However, when I pass 'slEvaluateFeature' a compute command list, and execute it on a compute queue, I get:
D3D12 ERROR: ID3D12CommandList::ResourceBarrier: D3D12_RESOURCE_STATES has invalid flags (0x80) for compute command list. [ RESOURCE_MANIPULATION ERROR #537: RESOURCE_BARRIER_INVALID_COMMAND_LIST_TYPE]
This is because the DLSS plugin uses chi::ResourceState::eTextureRead for all of its texture resource states for some reason, even though that maps to D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE, which I don't believe is necessary for DLSS's compute shaders, and which cannot be used on a compute queue.
I tried lying to the API, saying my resources were already in D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE or D3D12_RESOURCE_STATE_ALL_SHADER_RESOURCE, but it still tries to interact with D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE (maybe because of an internal resource?), breaking evaluation.
I also tried fixing it myself, but the build instructions did not work.
This isn't an expected use case for DLSS in Streamline. Can you explain further what you're trying to do? Also, please clarify which feature you're trying to use (SR, RR, FG).
You may have better luck using the lower-level SDK that Streamline wraps, i.e. https://github.com/NVIDIA/DLSS.
How did the build instructions not work?
The build instructions were somewhat my fault- I didn't check inside setup.bat to see if it needed any arguments, since the readme didn't mention any. It just needed 'vs2022'.
As for what I'm doing, since DLSS (SR, in this case) is a pure compute effect, I expected to be able to use Streamline to run it on the async compute queue, to do things like overlapping frame i's post-processing with frame i+1's geometry passes, as an example (and just to keep all my post-processing on the compute queue, since swapping it back to the graphics queue just for DLSS is silly).
However, it seems that SL does work that expects to be on the graphics queue (maybe for ImGUI? I built it in release mode, which seems set up to not compile the ImGUI stuff though...), and also transitions input resources into D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE for some reason (through chi::ResourceState::eTextureRead), when I would expect them to be using chi::ResourceState::eStorageRead instead.
Overall, it looks like Streamline isn't set up to support multi-queue applications, which is quite unfortunate given that it's targeting DX12/VK.
I'm guessing I should just integrate the lower-level APIs directly?
It's probably possible to get SL's SR plugin working on a compute queue, but that likely doesn't extend to the other plugins, i.e. SL in general.
There's other integrations that use e.g. SR or RR in compute contexts, but I believe they all use the lower-level NGX APIs.