binaryen List of missing peephole optimizations

Is it possible doing some trivial logical simplifications for now? Or better wait souper with z3 solver for that? like simplify from:

(func $test (export "test") (type $t0) (param $p0 i32) (result i32)
    get_local $p0
    i32.const 10
    i32.gt_s
    get_local $p0
    i32.const 10
    i32.eq
    i32.or)

to:

(func $test (export "test") (type $t0) (param $p0 i32) (result i32)
    get_local $p0
    i32.const 9
    i32.gt_s)

or:

(func $test (export "test") (type $t0) (param $p0 i32) (result i32)
    get_local $p0
    i32.const 10
    i32.ge_s)

DONE: from:

  get_global $i
  i32.const 0
  i32.le_s
  i32.eqz
  if

to:

  get_global $i
  i32.const 0
  i32.gt_s
  if

Nov 24 '18 11:11 MaxGraey

Good idea, we should do these!

It's not hard to add them. The correct pass is OptimizeInstructions (src/passes/OptimizeInstructions.cpp). Do you want to do it?

Nov 24 '18 16:11 kripken

Oh, great! I think better if you start some basics and may be after I add additional methods/rules for that pass =)

Nov 24 '18 17:11 MaxGraey

@MaxGraey Sounds good, I can get the stuff set up so that it's just a matter of adding the specific rules into that pattern. Take a look at https://github.com/WebAssembly/binaryen/commit/d3e5d4416f1bcb8fe3998db39351be766ba250aa - I added the first rule into combineOr there, and a test in $combine-or. The rest should be straightforward, let me know if you need any help.

(I realized I need to land a few other things before that commit, so I can't land it right now, but posting this now so you can get started if you want.)

Nov 24 '18 23:11 kripken

Great! Are you plan merge this to master? Or I can just fork oi2 and continue work in separate branch?

btw I use one trick with range checking if all arguments is unsigned. Like:

(x >= C0) & (x <= C1)

where x, C0 and C1 are unsigned ints

(unsigned)x - C0 <= C1 - C0

Also integer sequence of logical expressions like:

(x == C) | (x == C + 1) | ... | (x == N)

could transform to:

(unsigned)(x - C) <= N - C

is it make sense implement this rule as well, wdyt?

Nov 25 '18 15:11 MaxGraey

@MaxGraey

Yes, for now fork oi2 and work on that, since I might not be able to land it today.

That unsigned trick looks valid to me. Maybe leave it for a separate PR afterwards though, so there isn't too much in one?

Nov 25 '18 15:11 kripken

Also figure outing some useful rules for integer (sing/unsign) comparators:

x > x - 1

could always transform to:

x != MIN_VALUE

and

x + 1 > x

could always transform to:

x != MAX_VALUE

Nov 27 '18 23:11 MaxGraey

Also wdyt using for x != 0:

get_local $0
i32.eqz
i32.eqz

instead

get_local $0
i32.const 0
i32.ne

This reducing code by 1 byte, but generate worse machine code so this probably useful only for shrink code >= 2. Of course this not very likely code because if/else/select don't require this operation explicitly

Nov 27 '18 23:11 MaxGraey

Comment before last looks good to me.

Last comment also looks good - as you said in if or select etc. inputs we can flip the arms and avoid != 0 anyhow, but there are cases when it happens elsewhere, so i think it's a good rule to add.

Nov 28 '18 21:11 kripken

@MaxGraey how are you getting along with these optimizations? Anything I can help with?

Dec 20 '18 21:12 kripken

@kripken Sorry, currently I busy on work but I hope I will return to this on weekend

Dec 20 '18 21:12 MaxGraey

Just note. Found another possible improvments like transform this:

i32.const 1
i32.const 0
get_local $p0
select

or this:

get_local $p0
if $I0 (result i32)
  i32.const 1
else
  i32.const 0
end

to this (depending on $p0 type)

get_local $p0
i32.const 0
i32.ne

UPDATE

Implemented in #2869

Jan 30 '19 17:01 MaxGraey

More possible optimizations:

from:

(func $t1 (export "t1") (type $t0) (param $p0 i32) (result i32)
    get_local $p0
    i32.const 0
    i32.lt_s)

to:

(func $t1 (export "t1") (type $t0) (param $p0 i32) (result i32)
    get_local $p0
    i32.const 31
    i32.shr_s)

from:

 (func $t2 (export "t2") (type $t0) (param $p0 i32) (result i32)
    get_local $p0
    i32.const -1
    i32.ne)

to (but not always possible, probably valid only if result used as i1 in next steps):

(func $t2 (export "t2") (type $t0) (param $p0 i32) (result i32)
    get_local $p0
    i32.const -1
    i32.xor)

Mar 10 '19 10:03 MaxGraey

one more peephole optimization:

~(1 << n) -> rotl(-2, n)

which looks like in wat as:

    i32.const 1
    get_local $p0
    i32.shl
    i32.const -1
    i32.xor

to:

    i32.const -2
    get_local $p0
    i32.rotl

Done!

Apr 25 '20 08:04 MaxGraey

(x >> C) != 0 =========> (unsigned)x > ((1 << C) - 1) ((unsigned)x >> C) > 0 ==> (unsigned)x > ((1 << C) - 1) ((signed)x >> C) > 0 ====> (signed)x > ((1 << C) - 1)

(x >> C) == 0 =========> (unsigned)x < ((1 << C) - 1) (x >> C) < 0 ==========> (unsigned)x < ((1 << C) - 1) ((signed)x >> C) < 0 ====> (unsigned)x >> 31 ~~((unsigned)x >> C) < 0 ==> 0~~

May 24 '20 09:05 MaxGraey

Such distributive optimizations for integers also missing currently:

x * y + x * z
x * y - x * z

to

x * (y + z)
x * (y - z)

Upd PR: #3132

May 24 '20 19:05 MaxGraey

~~(signed)x % С_pot == 0 ==> (x & (С_pot - 1)) == 0~~ ~~(signed)x % С_pot != 0 ==> (x & (С_pot - 1)) != 0~~ x % C == C ==> 0 x % C != C ==> 1, if C != 0

(unsigned)x % C >= C, (unsigned)x % C > C ==> 0, if C != 0

(signed)x % C >= C, (signed)x % C > C ==> C < 0, if C != 0

(unsigned)x % C <= C, (unsigned)x % C < C ==> 1, if C != 0

(signed)x % C <= C, (signed)x % C < C ==> (C ^ -1) < 0, if C != 0

May 27 '20 06:05 MaxGraey

Pretty common patterns. But what is surprising is neither LLVM nor GCC can't optimize this: (x > y) - (x < y) < 0 -> x < y (x > y) - (x < y) <= 0 -> x <= y (x > y) - (x < y) == 0 -> x == y (x > y) - (x < y) > 0 -> x > y (x > y) - (x < y) >= 0 -> x >= y

Need to think could we generalize this more

Jun 16 '20 11:06 MaxGraey

// add-sub integer patters (x + y) - x -> y (x - y) - x -> 0 - y (x + y) - y -> x (x - y) - y -> x - (y << 1)

x - (x + y) -> 0 - y ~~x - (x - y) -> y~~ (#3014) y - (x + y) -> 0 - x y - (x - y) -> (y << 1) - x

(x + y) + x -> (x << 1) + y (x - y) + x -> (x << 1) - y (x + y) + y -> (y << 1) + x (x - y) + y -> x

however probably better add general Reassociate sub pass which also handle others associative ops like |, &, ^

// div-mul integer patters ((signed)x / y) * y -> x - ((signed)x % y) ((unsigned)x / y) * y -> x - ((unsigned)x % y), if pure(x) && pure(y) && y != C

Jun 16 '20 12:06 MaxGraey

~~-(x + C) -> -C - x~~ ~~-(C - x) -> x - C~~ -1 - x -> x ^ -1

Jun 21 '20 22:06 MaxGraey

bool(x) ? -1 : 0 -> 0 - x bool(x) ? 0 : -1 -> 0 - (x ^ 1)

where bool(x) is getBits(x) == 1

Jul 02 '20 16:07 MaxGraey

float point simplifications:

x  > 0.0 ? 1.0 : -1.0 -> copysign(1.0, +x) x >= 0.0 ? 1.0 : -1.0 -> copysign(1.0, +x) x  < 0.0 ? 1.0 : -1.0 -> copysign(1.0, -x) x <= 0.0 ? 1.0 : -1.0 -> copysign(1.0, -x)

x  > 0.0 ? -1.0 : 1.0 -> copysign(1.0, -x) x >= 0.0 ? -1.0 : 1.0 -> copysign(1.0, -x) x  < 0.0 ? -1.0 : 1.0 -> copysign(1.0, +x) x <= 0.0 ? -1.0 : 1.0 -> copysign(1.0, +x)

copysign(x, +C) -> abs(x) copysign(x, -C) -> -abs(x)

x * copysign(1.0, x) -> abs(x) x * copysign(1.0, -x) -> -abs(x) ~~abs(x) * abs(x) -> x * x~~ (#3013)

copysign(x, y) == 0 -> x == 0

copysign(copysign(x, y), z) -> copysign(x, z) copysign(x, y) * copysign(x, y) -> x * x

~~sqrt(x) < C -> x >= 0 && x < C * C~~

Aug 01 '20 08:08 MaxGraey

x / C_pot -> x * (1 / C_pot) (#3018)

where x -> float point, C_pot power of two float point constant

Aug 02 '20 07:08 MaxGraey

~~x * 2.0 -> x + x~~ (#3016) done!

It seems this transform may reduce up to 7 bytes due to wasm doesn't use any variant encoding for floats cc @kripken

Aug 02 '20 07:08 MaxGraey

~~i32(i64(x)) -> x, where x::i32~~ done ~~i32(u64(x)) -> x, where x::i32~~ done same as:

i64.extend_s/i32 ;; or i64.extend_u/i32
i32.wrap/i64

~~i64(u32(x)) -> x & 0xFFFFFFFF, where x::i64~~ not optimal ~~or~~ ~~i32.wrap/i64 i64.extend_u/i32 ;; but not for extend_s/i32~~

Aug 15 '20 15:08 MaxGraey

x ? 0 : C_pot -> (!x) << log2(C_pot) x ? C_pot : 0 -> (!!x) << log2(C_pot)

x * (y ? C1 : C2) -> y ? C1 * x : C2 * x, if C1, C2 <- -1 | 0 | 1

examples:

x * (y ? -1 : 1) -> y ? -x : x x * (y ? 1 : 0) -> y ? x : 0 x * (y ? -1 : 0) -> y ? -x : 0

Aug 29 '20 07:08 MaxGraey

~x | x -> -1 ~~~x ^ x -> -1~~ ~x + x -> -1 ~x + 1 -> 0 - x

~~-x - 1 -> ~x (x ^ -1)~~ -1 - x -> ~x (x ^ -1)

Oct 02 '20 14:10 MaxGraey

~~x < 0 ? -1 : 1 -> x >> 31 | 1~~ done!

Mar 05 '21 01:03 MaxGraey

x / C -> 0 if maxBits(x) < maxBits(C)

like: (x & 3) / 4 -> 0

Sep 12 '21 11:09 MaxGraey

i32(f64.max(f64(x), f64(y)) -> select(x, y, x > y), if type(x & y) == i32 | u32 i32(f64.min(f64(x), f64(y)) -> select(x, y, x < y), if type(x & y) == i32 | u32

i32(f64.abs(f64(x))) -> select(-x, x, x < 0), if type(x) == i32

DONE

~~i32(f64(x)) -> x, if type(x) == i32 | u32~~ ~~u32(f64(x)) -> x, if type(x) == i32 | u32~~

~~floor(f64(x)) -> f64(x), if type(x) == i32 | u32~~ ~~ceil(f64(x)) -> f64(x), if type(x) == i32 | u32~~ ~~trunc(f64(x)) -> f64(x), if type(x) == i32 | u32~~ ~~nearest(f64(x)) -> f64(x), if type(x) == i32 | u32~~

Sep 16 '21 20:09 MaxGraey

-(i32(x) / C) -> i32(x) / -C, if x != u32, C != min_s

this can't be optimized nor div_s either div_u: ~~-x / C -> x / -C~~

Sep 24 '21 14:09 MaxGraey