Intrinsics ddx_fine/ddy_fine should not be allowed to sink into flow control
Description
Intrinsics ddx_fine/ddy_fine should not be allowed to sink into flow control. That can currently happen because they are marked ReadNone same as other unary ops, but they are reading values from neighboring threads in the same quad.
Steps to Reproduce
- Open
tools/clang/unittests/HLSLExec/ShaderOpArith.xml, find data forHelperLaneTestNoWaveand remove-opt-disable sinkfrom the shader compile arguments. - Run
hcttest exec-filter *HelperLaneTest -verbose -adapter NVidia*
Actual Behavior
StartGroup: ExecutionTest::HelperLaneTest
Verifying IsHelperLane in shader model 6.0
Using Adapter:NVIDIA GeForce GTX 1660 Ti
Error: Verify: AreEqual(pTestResults[0].is_helper_00, 0) - Values (1, 0) [File: D:\dxc2\tools\clang\unittests\HLSLExec\ExecutionTest.cpp, Function: ExecutionTest::HelperLaneTest, Line: 10939]
EndGroup: ExecutionTest::HelperLaneTest [Failed]
Expected Behavior
StartGroup: ExecutionTest::HelperLaneTest
Verifying IsHelperLane in shader model 6.0
Using Adapter:NVIDIA GeForce GTX 1660 Ti
Verifying IsHelperLane in shader model 6.6
Using Adapter:NVIDIA GeForce GTX 1660 Ti
EndGroup: ExecutionTest::HelperLaneTest [Passed]
More info
This shader code
int is_helper_accross_X = ReadAcrossX_DD(is_helper, isLeft);
int is_helper_accross_Y = ReadAcrossY_DD(is_helper, isTop);
int is_helper_accross_Diag = ReadAcrossDiagonal_DD(is_helper, isLeft, isTop);
if (!isLeft && !isTop) { //bottom right pixel writes results
g_testResults[i].is_helper_00 = is_helper_accross_Diag;
g_testResults[i].is_helper_10 = is_helper_accross_Y;
g_testResults[i].is_helper_01 = is_helper_accross_X;
g_testResults[i].is_helper_11 = is_helper;
}
is optimized into code equivalent to
int is_helper_accross_X = ReadAcrossX_DD(is_helper, isLeft);
if (!isLeft && !isTop) { //bottom right pixel writes results
int is_helper_accross_Y = ReadAcrossY_DD(is_helper, isTop);
int is_helper_accross_Diag = ReadAcrossDiagonal_DD(is_helper, isLeft, isTop);
g_testResults[i].is_helper_00 = is_helper_accross_Diag;
g_testResults[i].is_helper_10 = is_helper_accross_Y;
g_testResults[i].is_helper_01 = is_helper_accross_X;
g_testResults[i].is_helper_11 = is_helper;
}
The ReadAcrossDiagonal_DD function is reading a value from neighboring thread in the same quad. That value has not been calculated because this call and ReadAcrossY_DD got moved under the if flow control and did not execute on all threads in the quad.
Environment
- DXC version 634c2f349 or later (main branch from July 28 forward, after HLSL 2021 was enabled by default).
- Windows
I am hitting this problem, is there any known workaround?
The compile flag -opt-disable sink should help with that.