[Bug] [RISC-V RVV] max_pool2d operator shows minor performance regression

Open yanyanyanggg opened this issue 2 months ago • 0 comments

Issue: [RISC-V RVV] max_pool2d operator shows minor performance regression

Description

The max_pool2d operator shows slight performance degradation with the RISC‑V Vector (RVV) extension, achieving 0.867× the performance of the scalar implementation. While the regression is smaller than other operators, it still indicates suboptimal vectorization for 2D max pooling.

Steps to Reproduce

Generate the max_pool2d operator with the following configuration:

params = {
    "dtype": "float32",
    "batch": 14,
    "pool_channels": 23,
    "pool_size": 2,
    "stride": 4,
    "padding": 1,
    "input_height": 99,
    "input_width": 95
}

Export the operator to two targets:

RV target (scalar, without vector extension):

llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d -mattr=+64bit,+m,+a,+f,+d,+c

RVV target (with vector extension):

llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d -mattr=+64bit,+m,+a,+f,+d,+c,+v

Run performance measurement on both targets.

Operator definition code:

def export_max_pool2d(params, set_dir=None, platform="rv"):
    data = relay.var("data",
                     shape=(params["batch"], params["pool_channels"],
                            params["input_height"], params["input_width"]),
                     dtype=params["dtype"])
    pool = relay.nn.max_pool2d(
        data,
        pool_size=(params["pool_size"], params["pool_size"]),
        strides=(params["stride"], params["stride"]),
        padding=(params["padding"], params["padding"])
    )
    export_op(pool, params["op_name"], [data], params, set_dir=set_dir)

Performance Data

RV execution time: 8.357100 ms
RVV execution time: 9.634620 ms
Acceleration ratio (RV/RVV): 0.867 (RVV is ~1.15× slower)

Environment Information

TVM version: 0.19.0
LLVM version: [Please provide: llvm-config --version]
Hardware: Spacemit K1‑X bit‑brick board
CPU: Spacemit X60 (8 cores, 1.6 GHz)
ISA: rv64imafdcv (with vector extensions)
Memory: 7.6 GB
OS: Bianbu 2.2, Linux kernel 6.6.63
Operation: 2×2 max pooling with stride 4 on input shape (14, 23, 99, 95)

Expected Behavior

RVV vectorization should provide a performance improvement over the scalar RV baseline for 2D pooling operations like max_pool2d.

Additional Context

The operation performs 2×2 max pooling with stride 4 and padding 1 on a 4D tensor.
While the performance regression is less severe than for other operators, it still indicates that the vectorized implementation of 2D max pooling may have inefficiencies in memory access patterns or vector reduction within pooling windows.
This is part of a broader pattern where all tested operators show performance degradation with RVV, suggesting potential issues with vectorization strategies in TVM's RISC‑V backend.

Dec 09 '25 04:12 yanyanyanggg