binaryen icon indicating copy to clipboard operation
binaryen copied to clipboard

local-cse in O3 introduces side-effecting across structurally isolated sub-trees thus prevent const prop & DCE

Open xuruiyang2002 opened this issue 10 months ago • 2 comments

Given the following code:

(module
  (import "External" "external_function" (func $external_function))
  (func $_start (param $0 i64) (param $1 i32) (param $2 i32) (param $3 i32)
    (local $4 i32) (local $5 i32)
    i32.const 0
    call $foo
    drop
  )
  (func $foo (param $0 i32) (result i32)
    (local $1 i32) (local $2 i32) (local $4 i32) (local $5 i32) (local $scratch i32) (local $3 i64) (local $6 i64)
      i32.const 77986
      i32.load16_u
      i32.const 16
      local.tee $2
      i32.shl
      local.get $2
      i32.shr_s
      i32.const 65533
      i32.eq
      i32.const 77986
      i32.load16_u
      i32.const 16
      i32.shl
      i32.and
      if (result i32)  ;; label = @4
        call $external_function
        i32.const 0
      else
        i32.const 1
      end
    )
  (memory $0 258 258)
  (export "_start" (func $_start)))

wasm-opt (d0d970cb5) eliminates the dead br_if body by -all -O2 but can not do that by -all -O3.

Analysis

Similar but different to #7440, this time wasm-opt in O3 introduces side-effect in more complex structure by local-cse:

The figure below depicts how the local.cse transforms the input:

Image

It introduces the local.tee for less code size, however, it also affects the constant propagation and further blocks the dead code elimination.

Different from #7440, whose solution is simply enhancing the optimization of the binaryOp regardless that the side effects of its children, this issues is more complex because the local-cse does CSE in different depth and sub-tree, is there any code logic which can check or deal with this issue?

xuruiyang2002 avatar Apr 09 '25 12:04 xuruiyang2002

We do track the bits of locals, but I guess that isn't enough here - we need to see that the shl and shr_s combine to form a 16-bit sign-extend (after that, comparison to 65533 is always false, as the sign bit of a 16-bit number can't be set without the higher bits as well).

We could in theory track "partial sign-extends". Or, more generally, we could "look through" locals to their source, maybe in some optimize-instructions-propagate that would parallel precompute-propagate (in that it looks through locals, using a LocalGraph; perhaps we could do it concretely by expanding "getFallthrough", that is, we'd look not just at fallthrough values in the expression itself, but that arrive via locals). Another option could be to "expand" local-cse results like this, knowing we can re-optimize them later if we fail to find a pattern for them.

None of those options is simple, and I tend to think the benefit is low, so I'd say this is low priority.

kripken avatar Apr 09 '25 22:04 kripken

Understood, and I agree. It’s might be better to leave this for later.

xuruiyang2002 avatar Apr 10 '25 03:04 xuruiyang2002