tvm
tvm copied to clipboard
[Bug] [RISC-V RVV] softmax operator shows suboptimal vectorization
Issue: [RISC-V RVV] softmax operator shows suboptimal vectorization
Description
The softmax operator performs worse with the RISC‑V Vector (RVV) extension, achieving only 0.745× the performance of the scalar implementation. This suggests inefficient vectorization for softmax operations.
Steps to Reproduce
- Generate the softmax operator with the following configuration:
params = {
"dtype": "float32",
"batch": 14,
"features": 185
}
-
Export the operator to two targets:
-
RV target (scalar, without vector extension):
llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d -mattr=+64bit,+m,+a,+f,+d,+c -
RVV target (with vector extension):
llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d -mattr=+64bit,+m,+a,+f,+d,+c,+v
-
RV target (scalar, without vector extension):
-
Run performance measurement on both targets.
Operator definition code:
def export_softmax(params, set_dir=None, platform="rv"):
data = relay.var("data", shape=(params["batch"], params["features"]),
dtype=params["dtype"])
softmax = relay.nn.softmax(data)
export_op(softmax, params["op_name"], [data], params, set_dir=set_dir)
Performance Data
- RV execution time: 1.831500 ms
- RVV execution time: 2.457040 ms
- Acceleration ratio (RV/RVV): 0.745 (RVV is ~1.34× slower)
Environment Information
- TVM version: 0.19.0
-
LLVM version: [Please provide:
llvm-config --version] - Hardware: Spacemit K1‑X bit‑brick board
- CPU: Spacemit X60 (8 cores, 1.6 GHz)
- ISA: rv64imafdcv (with vector extensions)
- Memory: 7.6 GB
- OS: Bianbu 2.2, Linux kernel 6.6.63
- Operation: Softmax on a 2D tensor of shape (14, 185)
Expected Behavior
RVV vectorization should provide a performance improvement over the scalar RV baseline for softmax operations, which involve reduction and elementwise operations that can be vectorized.
Additional Context
- The softmax operation is applied to a 2D tensor of shape (14, 185), which is a relatively small tensor compared to other operators tested.
- The performance regression, though less severe than some other operators, still indicates that the vectorized implementation of softmax may have inefficiencies in the reduction and exponentiation steps.
- This is part of a broader pattern where multiple operators show performance degradation with RVV, suggesting potential issues with vectorization strategies for reduction and elementwise operations.