Halide Halide floating point behavior

I'm having issues with a halide being a little too aggressive in it's handling of floating point calculations.

The issue at its core is that halide thinks that transforming (x+1)*n into (x*n)+n is a valid thing to do, and while that holds true for integer operations, floating point is less forgiving about this and will introduce small differences in the calculation, nvidia warns about this behavior of floating point math here .

Now this issue has come up before in issue #3549 which suggested the use of strict_float which did seem to initially solve the issue, it seems it was just better at masking it.

Even with strict_float applied, halide still seemingly randomly decides to mess with the order of things. I attached a heavily commented repro case in python (pastebin version available here).

Due to its use of strict_float it needs PR #3811 to function so make sure you have a recent build.

noise_cpu.zip

Apr 12 '19 17:04 LazyDodo

Took a stab at resolving this and postulated removing the AllowReassoc flag from the fastmath options here would resolve the issue. Kinda stumped it didn't, any suggestions on where to look next?

Apr 14 '19 22:04 LazyDodo

Found the cause, the solution is still a little unclear though, no_float_simplify is initialized to false here however nothing ever checks if the strict flag is on and changes this value, hence unsafe float optimizations are always allowed.

Apr 15 '19 16:04 LazyDodo

Hrm, it's supposed to be set to true when visiting the strict_float intrinsic in Simplify_Call.cpp

Apr 15 '19 16:04 abadams

So maybe the bug is that strict_float wasn't injected somewhere it should have been?

Apr 15 '19 16:04 abadams

yeah just saw that flag as well and it does seem to work, turning up the debug level in the codegen gives me

With lets: (let t0 = sin_f32(((float32(y)*78.233002f) + ((float32(x) + 1.000000f)*12.989800f))) in strict_float(((t0*43758.546875f) - floor_f32((t0*43758.546875f)))))

You may be onto something there, it seems to only partially mark the expression as strict float

Apr 15 '19 17:04 LazyDodo

Maybe CSE is pulling subexpressions out of strict_float incorrectly.

On Mon, Apr 15, 2019 at 10:03 AM LazyDodo [email protected] wrote:

yeah just saw that flag as well and it does seem to work, turning up the debug level in the codegen gives me

With lets: (let t0 = sin_f32(((float32(y)78.233002f) + ((float32(x) + 1.000000f)12.989800f))) in strict_float(((t043758.546875f) - floor_f32((t043758.546875f)))))

You may be onto something there, it seems to only partially mark the expression as strict float

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/halide/Halide/issues/3813#issuecomment-483336248, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfdRtermBZOyxg9w752afpOWBhlIKZYks5vhLD5gaJpZM4csk0l .

Apr 15 '19 17:04 abadams

Part of the reason for wrapping the strict_float intrinsic around all Expr of floating-point type is to ensure all IR transformations see it. (The other big reason is to make sure simplification doesn't do any simplifying as the intrinsic will never match any of its patterns. It also cuts off visiting those parts of the tree, but even if it visits inside of them, it shouldn't do anything.) It is probably ok to CSE something so long as a) the strict_float occurrences inside it are preserved, and b) the substitution is also wrapped in strict_float. The latter would seem unlikely I guess.

Apr 15 '19 18:04 zvookin

I'm going to guess what is happening here is that x + 1 is an integer expression and with strict float the IR gets simplified from (x + 1) * strict_float(val) to x * strict_float(val) + strict_float(val) (or vice-versa). The work around would be to write e.g. (x + 1.0f) * val in the source code.

Jun 18 '19 21:06 zvookin

You actually had it right in your April 15 post, CSE is losing the strict_float attribute during its substitutions, debugged into it to the point where I could validate your suspicion but have lacked time since to actually sit down and fix it.

Jun 19 '19 00:06 LazyDodo

Please see https://github.com/halide/Halide/pull/4012 . I haven't been able to get the reported test failure to reproduce however. @LazyDodo can you try this and see if it has any effect.

Jul 15 '19 21:07 zvookin

Is this being strictified twice? Generating nested strict_floats: strict_float(strict_float(x)) seems suboptimal

Jul 15 '19 23:07 abadams

Guess this comment was for #4012 ?

The test only applies strict_float once at the very outer level

import halide as hl

def hlrand(x,y):
    tx = x * 12.9898
    ty = y * 78.233
    dot = ty + tx
    sval = hl.sin(dot)
    sval_mul = sval * 43758.5453123
    fract = sval_mul - hl.floor(sval_mul)
    Res = fract
    return Res

def noise_strict(x,y, offset):
    noise = hl.Func("noise")
    noise[x,y] = hl.strict_float(hlrand( x + offset, y))
    return noise
    
def main():
    x,y = hl.Var("x"), hl.Var("y")
 
    cpunoise_offset = noise_strict(x,y,1.0 )
    cpunoise_offset.compile_to_lowered_stmt("cpunoise_offs.txt", [])

 
if __name__ == '__main__':
    main()

Jul 15 '19 23:07 LazyDodo

Oops, yeah. I'll make the comment on that thread instead.

Jul 15 '19 23:07 abadams