Cranelift: Quadratic compile time from `can_optimize_var_lookup`
It seems that a wasm source with a chain of blocks each containing a local.get instruction can have quadratic compile time because can_optimize_var_lookup will traverse the entire chain when handling each local.get instruction. As an example, the following wat takes 2.4 seconds to compile on my machine:
(module
(func (param i32)
(block (local.set 0 (i32.add (i32.const 1) (local.get 0))))
(block (local.set 0 (i32.add (i32.const 1) (local.get 0))))
... repeated 15,000 times
)
)
and a flamegraph: shows that almost the entire time is spent in
can_optimize_var_lookup. Note that the wasm file is 147 KB.
Steps to Reproduce
- Download the attached
blocks.wasm.gzand unzip it (or take the abovewatand convert it towasm). - Compile it with
time target/release/wasmtime compile blocks.wasm.
Expected Results
I'd expect the compilation to be on the order of 100ms as it is for other modules with a single function of similar size on my machine.
Actual Results
The compilation takes over 2 seconds.
Versions and Environment
Cranelift version or commit: commit 27435ae398bcce74bf990b9683e5c47c3f6c5d51
Operating system: Ubuntu 20.04, Linux kernel 5.4.0-122-generic
Architecture:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 43 bits physical, 48 bits virtual
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 2
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7302 16-Core Processor
Extra Info
I'm working on a PR which should help with this.
My draft fix for this is in #4939.