[Bug] [RISC-V RVV] softmax operator shows suboptimal vectorization

Open yanyanyanggg opened this issue 2 months ago • 0 comments

Issue: [RISC-V RVV] softmax operator shows suboptimal vectorization

Description

The softmax operator performs worse with the RISC‑V Vector (RVV) extension, achieving only 0.745× the performance of the scalar implementation. This suggests inefficient vectorization for softmax operations.

Steps to Reproduce

Generate the softmax operator with the following configuration:

params = {
    "dtype": "float32",
    "batch": 14,
    "features": 185
}

Export the operator to two targets:

RV target (scalar, without vector extension):

llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d -mattr=+64bit,+m,+a,+f,+d,+c

RVV target (with vector extension):

llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d -mattr=+64bit,+m,+a,+f,+d,+c,+v

Run performance measurement on both targets.

Operator definition code:

def export_softmax(params, set_dir=None, platform="rv"):
    data = relay.var("data", shape=(params["batch"], params["features"]),
                     dtype=params["dtype"])
    softmax = relay.nn.softmax(data)
    export_op(softmax, params["op_name"], [data], params, set_dir=set_dir)

Performance Data

RV execution time: 1.831500 ms
RVV execution time: 2.457040 ms
Acceleration ratio (RV/RVV): 0.745 (RVV is ~1.34× slower)

Environment Information

TVM version: 0.19.0
LLVM version: [Please provide: llvm-config --version]
Hardware: Spacemit K1‑X bit‑brick board
CPU: Spacemit X60 (8 cores, 1.6 GHz)
ISA: rv64imafdcv (with vector extensions)
Memory: 7.6 GB
OS: Bianbu 2.2, Linux kernel 6.6.63
Operation: Softmax on a 2D tensor of shape (14, 185)

Expected Behavior

RVV vectorization should provide a performance improvement over the scalar RV baseline for softmax operations, which involve reduction and elementwise operations that can be vectorized.

Additional Context

The softmax operation is applied to a 2D tensor of shape (14, 185), which is a relatively small tensor compared to other operators tested.
The performance regression, though less severe than some other operators, still indicates that the vectorized implementation of softmax may have inefficiencies in the reduction and exponentiation steps.
This is part of a broader pattern where multiple operators show performance degradation with RVV, suggesting potential issues with vectorization strategies for reduction and elementwise operations.

Dec 09 '25 04:12 yanyanyanggg