Diego Caballero
Diego Caballero
## Running Example: I'll use the following i8->i32 reduction example throughout this proposal for illustration purposes: ``` #map3 = affine_map (d0, d1)> #map4 = affine_map (d0)> util.global private @__A {noinline}...
### Request description Opening this mostly for discussion. I'm seeing quite a few dispatches in mobilebert-quant that are identical except by the constants consumed by `tosa.apply_scale`. They look like this:...
WIP
### Request description The benchmarking infrastructure will track the following models: - MobileNetV1-float - MobileBert-float - DeepLabV3-float - EfficientNet-Lite0-quant - MobileBert-quant - PersonDetect-quant We should add all these models to...
Some statically-shaped convolutions currently remain scalar at least on RISC-V. The RISC-V models that are more impacted by this issue are EfficientNet and PersonDetect. Some dispatches to repro: EfficientNet: ```...
Tile size computation in LLVMCPU is crying out for a refresh. The current approach is getting difficult to maintain and debug even for those familiar with the code. The goal...
The current implementation of `tosa.resize` is not vectorizable as it generates a complex `tensor.extract` operation. However, it looks like some `tosa.resize` ops can be canonicalize away. This issue is to...
The following dispatch from DeepLabV3 is not vectorized: ``` hal.executable private @main_dispatch_78 { hal.executable.variant public @embedded_elf_riscv_64, target = { hal.executable.export public @main_dispatch_78_generic_1x257x257x21 ordinal(0) layout(#hal.pipeline.layout) { ^bb0(%arg0: !hal.device, %arg1: index, %arg2:...
We currently generate the following instruction sequence for a i8/i32 GEMMs: ``` vsetvli zero,a5,e32,m2,ta,mu vle8.v v26,(s5) vsext.vf4 v28,v26 vmacc.vx v24,s8,v28 vmacc.vx v22,s9,v28 vmacc.vx v20,s10,v28 vmacc.vx v18,s11,v28 vmacc.vx v16,ra,v28 vmacc.vx v14,a6,v28...
It looks like we currently don't allow reassociation on floating-point reductions. However, some TFLite implentations that we use for comparisson allow reassociation. We should introduce a flag to allow fp...