binaryen icon indicating copy to clipboard operation
binaryen copied to clipboard

[Emitting Zero] [Local Tracking] General rule might not handle unary ne emitting zero bits well

Open xuruiyang2002 opened this issue 10 months ago • 3 comments

Given the following code:

(module
  (import "External" "external_function" (func $external_function))
  (func $foo (result i32)
    i32.const 0
    i32.load)
  (func $_start (param $0 i32) (param $1 i64) (param $2 i32) 
    (local $3 i32) (local $4 i32) (local $5 i32) (local $6 i32) (local $7 i32) (local $9 i32)
    call $foo
    local.set $4
    i32.const 1
    local.set $3
    local.get $4
    local.get $3
    i32.shl
    i32.const 1
    i32.shr_s
    local.set $6
    i32.const 0
    i32.const 0
    i32.store
    i32.const -259031342
    local.set $7
    local.get $6
    local.get $7
    i32.ne
    local.set $9
    block  ;; label = @1
      local.get $9
      i32.eqz
      br_if 0 (;@1;)
      unreachable
    end
    call $bar)
  (func $bar call $external_function)
  (memory $0 258 258)
  (export "_start" (func $_start)))

wasm-opt (755a8d0e) should deduce the condition to false, thus fold the branch to unreachable (further deleting call $bar), which works under -O2 but fails under -O3.

Below is optimized by -all -O3:

(func $_start (type $1) (param $0 i32) (param $1 i64) (param $2 i32)
  (local.set $0
   (i32.shr_s
    (i32.shl
     (i32.load
      (i32.const 0)
     )   
     (i32.const 1)
    )   
    (i32.const 1)
   )   
  )
  (i32.store
   (i32.const 0)
   (i32.const 0)
  )
  (if 
   (i32.ne
    (local.get $0) 
    (i32.const -259031342)
   )   
   (then
    (unreachable)
   )   
  )
  (call $external_function)
 )

As you can see, the condition here

   (i32.ne
    (local.get $0) 
    (i32.const -259031342)
   )   

is not deduced to true.

Similar to #7492, Maybe there a missing rule for it to emit zero bits, or is the local tracking insufficient?

xuruiyang2002 avatar Apr 14 '25 07:04 xuruiyang2002

Yes, limitations of local tracking hit us here: we don't see all the contents in the local, just part of the pattern we are looking for.

I think, however, that a good fix for this would actually be in SimplifyLocals. That will normally move the local into the use:

  (local.set $0
   (i32.shr_s
    (i32.shl
     (i32.load
      (i32.const 0)
     )
     (i32.const 1)
    )
    (i32.const 1)
   )
  )
  (if
   (i32.ne
    (local.get $0)
    (i32.const -259031342)
   )
   (then
    (unreachable)
   )
  )

=>

  (if
   (i32.ne
    (i32.shr_s
     (i32.shl
      (i32.load
       (i32.const 0)
      )
      (i32.const 1)
     )
     (i32.const 1)
    )
    (i32.const -259031342)
   )
   (then
    (unreachable)
   )
  )

(and remove the local.set). The problem here is the inner load, which has effects - that pass sees effects on the local.set's value, and gives up. We could do one of two things:

  1. Make it find the part without effects, and move only that.
  2. Add a pass that un-optimizes nested effects like that. That is,
(local.set $0
 (i32.eqz
  (i32.load)
 )
)

=>

(local.set $inner
 (i32.load)
)
(local.set $0
 (i32.eqz
  (local.get $inner)
 )
)

This is the opposite of what SimplifyLocals does, but it would solve a whole range of these issues, I think.

kripken avatar Apr 16 '25 23:04 kripken

That was a clear explanation; I understand now.

The i32.load (with side-effect) in the calculation for $0 prevents the SimplifyLocals from inlining calculation into the if condition. Without such inlining, Precompute-propagation doesn't see the full picture, thus fails to do further optimizations.

Sounds the first solution is limited, second is more general. Add a new pass which could find operations with side-effect and isolate them with temporary variable, thus SimplifyLocals could do inlining as normal.

But how much do these solutions cost? Is it good enough to fix #7492?

xuruiyang2002 avatar Apr 17 '25 02:04 xuruiyang2002

It wouldn't fix that particular issue - the problem there wasn't an inner expression with effects like the load in the last example.

But it would fix this, and iirc a few others. This is a general situation we don't yet handle well enough. Flatten was, in theory, the solution to this, but it does a lot more:

https://github.com/WebAssembly/binaryen/blob/e185ff94852efdab9d6d75fa542a7b5b5c785d27/src/ir/flat.h#L45-L55

We need 1 and 3 from that list, but not 2 and 4. Perhaps a variant of the Flatten pass that only does them is an option here, "minimally flatten" or "flatten expressions (but not control flow)".

kripken avatar Apr 18 '25 23:04 kripken