Halide floating point behavior
I'm having issues with a halide being a little too aggressive in it's handling of floating point calculations.
The issue at its core is that halide thinks that transforming (x+1)*n into (x*n)+n is a valid thing to do, and while that holds true for integer operations, floating point is less forgiving about this and will introduce small differences in the calculation, nvidia warns about this behavior of floating point math here .
Now this issue has come up before in issue #3549 which suggested the use of strict_float which did seem to initially solve the issue, it seems it was just better at masking it.
Even with strict_float applied, halide still seemingly randomly decides to mess with the order of things. I attached a heavily commented repro case in python (pastebin version available here).
Due to its use of strict_float it needs PR #3811 to function so make sure you have a recent build.
Took a stab at resolving this and postulated removing the AllowReassoc flag from the fastmath options here would resolve the issue. Kinda stumped it didn't, any suggestions on where to look next?
Found the cause, the solution is still a little unclear though, no_float_simplify is initialized to false here however nothing ever checks if the strict flag is on and changes this value, hence unsafe float optimizations are always allowed.
Hrm, it's supposed to be set to true when visiting the strict_float intrinsic in Simplify_Call.cpp
So maybe the bug is that strict_float wasn't injected somewhere it should have been?
yeah just saw that flag as well and it does seem to work, turning up the debug level in the codegen gives me
With lets: (let t0 = sin_f32(((float32(y)*78.233002f) + ((float32(x) + 1.000000f)*12.989800f))) in strict_float(((t0*43758.546875f) - floor_f32((t0*43758.546875f)))))
You may be onto something there, it seems to only partially mark the expression as strict float
Maybe CSE is pulling subexpressions out of strict_float incorrectly.
On Mon, Apr 15, 2019 at 10:03 AM LazyDodo [email protected] wrote:
yeah just saw that flag as well and it does seem to work, turning up the debug level in the codegen gives me
With lets: (let t0 = sin_f32(((float32(y)78.233002f) + ((float32(x) + 1.000000f)12.989800f))) in strict_float(((t043758.546875f) - floor_f32((t043758.546875f)))))
You may be onto something there, it seems to only partially mark the expression as strict float
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/halide/Halide/issues/3813#issuecomment-483336248, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfdRtermBZOyxg9w752afpOWBhlIKZYks5vhLD5gaJpZM4csk0l .
Part of the reason for wrapping the strict_float intrinsic around all Expr of floating-point type is to ensure all IR transformations see it. (The other big reason is to make sure simplification doesn't do any simplifying as the intrinsic will never match any of its patterns. It also cuts off visiting those parts of the tree, but even if it visits inside of them, it shouldn't do anything.) It is probably ok to CSE something so long as a) the strict_float occurrences inside it are preserved, and b) the substitution is also wrapped in strict_float. The latter would seem unlikely I guess.
I'm going to guess what is happening here is that x + 1 is an integer expression and with strict float the IR gets simplified from (x + 1) * strict_float(val) to x * strict_float(val) + strict_float(val) (or vice-versa). The work around would be to write e.g. (x + 1.0f) * val in the source code.
You actually had it right in your April 15 post, CSE is losing the strict_float attribute during its substitutions, debugged into it to the point where I could validate your suspicion but have lacked time since to actually sit down and fix it.
Please see https://github.com/halide/Halide/pull/4012 . I haven't been able to get the reported test failure to reproduce however. @LazyDodo can you try this and see if it has any effect.
Is this being strictified twice? Generating nested strict_floats: strict_float(strict_float(x)) seems suboptimal
Guess this comment was for #4012 ?
The test only applies strict_float once at the very outer level
import halide as hl
def hlrand(x,y):
tx = x * 12.9898
ty = y * 78.233
dot = ty + tx
sval = hl.sin(dot)
sval_mul = sval * 43758.5453123
fract = sval_mul - hl.floor(sval_mul)
Res = fract
return Res
def noise_strict(x,y, offset):
noise = hl.Func("noise")
noise[x,y] = hl.strict_float(hlrand( x + offset, y))
return noise
def main():
x,y = hl.Var("x"), hl.Var("y")
cpunoise_offset = noise_strict(x,y,1.0 )
cpunoise_offset.compile_to_lowered_stmt("cpunoise_offs.txt", [])
if __name__ == '__main__':
main()
Oops, yeah. I'll make the comment on that thread instead.