Alexander Root

Results 18 issues of Alexander Root

Left as a draft for now because this work is unfinished, but I was hoping to get feedback. This PR is intended to create a separate vector-instruction-selection pass for x86,...

The issue in #6893 is the result of an attempt to dereference a pointer to the module_state pointer. This PR simply adds a runtime method to dereference the pointer, in...

This PR provides a series of methods for removing/simplifying correlated expressions for `find_constant_bounds`: - Bounded let-substitutions (~~n=100~~ edit: n=16). We don't want to always substitute all lets, but some constant...

The InjectHexagonRpc mutator inserts `Let` nodes that label `Load` names instead of `Variable` names, specifically in the definition of `state_var()`. This breaks let substitutions (i.e. `RemoveLets()` in CSE.cpp), as `Let`...

This PR adds support for `int16 -> int32` horizontal widening adds to use `pmaddwd`, and pattern matches on horizontal adds to use `phadd(w | d)`, which is faster than the...

performance

I have a branch that is ~30 commits behind master (specifically up to date with 813eadc) that reports a manual schedule of depthwise_separable_conv of: ``` Manually-tuned time: 0.369731ms ``` Meanwhile,...

Running the adams2019 autoscheduler (with default parameters) on harris is producing the following OOB error on my machine: ``` Error: Input buffer input is accessed at -5, which is before...

Using the adams2019 autoscheduler, on master. The following series of commands are run from apps/local_laplacian: ``` make clean make bin/host/local_laplacian.generator # Make a runtime ./bin/host/local_laplacian.generator -r runtime -o bin/host target=host...

On the ARM backend, we should be targeting the USDOT/SUDOT instructions for mixed-sign dot products, i.e. when compiling [conv3x3](https://github.com/halide/Halide/blob/main/apps/hexagon_benchmarks/conv3x3_generator.cpp) with accumulator type `Int(32)`. LLVM exposes an intrinsics for [USDOT](https://github.com/llvm/llvm-project/blob/1d8a7adca6afa479d2913189631d941e0b084825/llvm/lib/Target/AArch64/AArch64InstrInfo.td#L1037).

performance

This PR is just for getting buildbot coverage, I'll need to do some messy rebasing after #6884 is merged in. This passes all tests on my M1, but I wanted...