[RFC] webassembly simd as new dedicated target
pros
- wasm as a first-class citizen
- avoid any slow emulation of sse2 intrinsics
- brings native i8 mul long and i16-dp
- brings fmadd and i8-u7-dp with relaxed-simd, which is crucial for neural network inference
- v8 implement fp16 functions
cons
- new target with LOTS OF optimization code
to be investigated
- discovery of simd, relaxed-simd and half-precision https://github.com/Tencent/ncnn/issues/5902#issuecomment-2653120402
- test code coverage https://github.com/Tencent/ncnn/issues/5902#issuecomment-2655366538
resources
llvm wasm simd header https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/wasm_simd128.h
simd https://github.com/WebAssembly/simd/blob/main/proposals/simd/SIMD.md
relaxed-simd https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md
half-precision https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md
runtime simd and relaxed-simd discovery
#include <stdio.h>
#include <emscripten.h>
EM_JS(int, check_simd_support, (), {
return WebAssembly.validate(new Uint8Array([0,97,115,109,1,0,0,0,1,5,1,96,0,1,123,3,2,1,0,10,10,1,8,0,65,0,253,15,253,98,11]));
});
EM_JS(int, check_relaxed_simd_support, (), {
return WebAssembly.validate(new Uint8Array([0,97,115,109,1,0,0,0,1,5,1,96,0,1,123,3,2,1,0,10,15,1,13,0,65,1,253,15,65,2,253,15,253,128,2,11]));
});
int main()
{
printf("simd = %d\n", check_simd_support());
printf("relaxed-simd = %d\n", check_relaxed_simd_support());
return 0;
}
| app | simd | relaxed-simd |
|---|---|---|
| linux firefox | 1 | 0 |
| linux chrome | 1 | 1 |
| android browser | 1 | 1 |
| android wechat webview | 1 | 1 |
| android wechat work webview | 1 | 0 |
| macos safari | 1 | 0 |
| ios safari | 1 | 0 |
| ios wechat safari | 1 | 0 |
wasm code coverage
# compile with --coverage -fprofile-arcs -ftest-coverage -g
emcc test.cpp -c -o test.o --coverage -fprofile-arcs -ftest-coverage -g
emcc simplegcov.cpp -c -o simplegcov.o --coverage -fprofile-arcs -ftest-coverage -g
# link with --coverage -g -s FORCE_FILESYSTEM=1 -sEXIT_RUNTIME=1
# link with -lnodefs.js -lnoderawfs.js
emcc test.o simplegcov.o -o test.js -s EXPORTED_FUNCTIONS='["_main"]' --coverage -g -s FORCE_FILESYSTEM=1 -sEXIT_RUNTIME=1 -lnodefs.js -lnoderawfs.js
# run wasm
node test.js
# collect gcda and generate output html
lcov -d ./ -c --gcov-tool `pwd`/llvm-gcov -o lcov.info --ignore-errors inconsistent
genhtml lcov.info -o ./output --ignore-errors inconsistent
https://github.com/Tencent/ncnn/pull/5903
the llvm-gcov wrapper
#!/bin/sh
exec llvm-cov gcov "$@"
In fact, even if simd/relaxed-simd discovery is possible, wasm binary distribution is problematic. The wasm runtime requires compilation when loading wasm, which will fully load the basic/simd/relaxed-simd bytecode in wasm, even if only part of it is used during actual runtime, causing compile to fail.