[Bug][CodeGen] V 0.18.0 compilation after tir.Simplify causes Segmentation Faults
Environment Information
OS: Ubuntu 20.04 LTS Python: 3.10.4 TVM: v0.18.0 built from source with CPU only, no GPU usage in place
Steps to Reproduce:
import tvm
from tvm import tir
from tvm.tir.analysis.analysis import verify_well_formed, verify_memory
from tvm.script import tir as T
@T.prim_func
def main(p0: T.Buffer((1, 4, 4, 512), "float32"), T_relu: T.Buffer((1, 4, 4, 512), "float32")):
T.func_attr({"from_legacy_te_schedule": T.bool(True), "hash": "8a54a445c9c66af6", "target": T.target({"host": {"keys": ["cpu"], "kind": "llvm", "mtriple": "x86_64-pc-linux-gnu", "tag": ""}, "keys": ["cpu"], "kind": "llvm", "mtriple": "x86_64-pc-linux-gnu", "tag": ""}), "tir.noalias": T.bool(True)})
for ax0_ax1_fused in T.parallel(4):
for ax2, ax3_outer in T.grid(4, 32):
f6a = T.uint32()
cse_var_1: T.int32 = T.max(-1390372897, T.Shuffle([T.Broadcast(T.Cast("int32", T.ldexp(T.Cast("float32", T.Shuffle([T.Broadcast(1845025892, 3)], [0])), T.Cast("float32", T.Shuffle([T.Broadcast(928309885, 3) - T.Broadcast(-929172350, 3)], [2])))), 3), T.Broadcast(1237026474, 4) // T.Broadcast(-136990005, 4) * T.min(T.Broadcast(2087815492, 4), T.Broadcast(73625980, 4)), T.Broadcast(-1604583078, 3), T.Broadcast(T.Cast("int32", T.nextafter(T.Cast("float32", T.Shuffle([T.Broadcast(-1615280813, 2)], [0])), T.Cast("float32", T.Shuffle([T.Broadcast(217806831, 2)], [1])))), 2), T.min(T.Cast("int32x3", T.max(T.Broadcast(T.fabs(T.Shuffle([T.Broadcast(T.float32(0.18184940914562642), 3)], [2])), 3), T.truncmod(T.max(T.Broadcast(T.float32(0.25323582565477309), 3), T.Broadcast(T.float32(0.45124937914803442), 3)), T.Broadcast(T.float32(0.84180151259624503), 3))) * T.Broadcast(T.float32(0.85337293296299876), 3)), T.Div(T.Broadcast(1075650555, 3), T.Broadcast(-1611576096, 3))) % T.Cast("int32x3", T.min(T.min(T.Broadcast(T.Cast("uint32", T.sqrt(T.Cast("float32", T.Mul(T.uint32(172697918), T.uint32(41149579))))), 3), T.Broadcast(T.uint32(1159702194), 3)), T.Broadcast(T.Cast("uint32", T.popcount(T.Cast("int32", T.Shuffle([T.Broadcast(T.uint32(908195236), 3) + T.Broadcast(T.uint32(653293327), 3)], [2])))), 3)))], [T.Let(T.Broadcast(T.uint32(1121529639), 3), where={f6a: T.uint32(192542249)})])) + ax2 * 512 + ax3_outer * 16
T_relu_1 = T.Buffer((8192,), data=T_relu.data)
p0_1 = T.Buffer((8192,), data=p0.data)
T_relu_1[cse_var_1:cse_var_1 + 16] = T.max(p0_1[cse_var_1:cse_var_1 + 16], T.Broadcast(T.float32(0.0), 16))
func = main
mod = tvm.ir.IRModule({'main': func})
if not verify_well_formed(mod) and verify_memory(func):
print("Validation failed")
else:
print("Beginning Compilation")
with tvm.transform.PassContext(opt_level=4):
nopt_mod = tvm.build(mod)
print("Success!")
Expected Behavior:
Successful Compilation or a reason for why the compilation target is invalid
Reality: Segmentation fault (core dumped) when testing
Simplifying the huge expression tree reveals this error: InternalError: Check failed: (val && *val >= 0 && *val < total_lanes) is false: Shuffled indeces are suppose to be int, but get T.Let(T.Broadcast(T.uint32(1121529639), 3), where={f6a: T.uint32(192542249)}).
Changing this to T.uint32(1) results in a success.
Thank you!
In this case, is it reasonable to assume that the compiler's behavior should be to surface the internal error rather than throw the segmentation fault when the TIR is invalid?