ncnn icon indicating copy to clipboard operation
ncnn copied to clipboard

[RFC] webassembly simd as new dedicated target

Open nihui opened this issue 1 year ago • 3 comments

pros

  • wasm as a first-class citizen
  • avoid any slow emulation of sse2 intrinsics
  • brings native i8 mul long and i16-dp
  • brings fmadd and i8-u7-dp with relaxed-simd, which is crucial for neural network inference
  • v8 implement fp16 functions

cons

  • new target with LOTS OF optimization code

to be investigated

  • discovery of simd, relaxed-simd and half-precision https://github.com/Tencent/ncnn/issues/5902#issuecomment-2653120402
  • test code coverage https://github.com/Tencent/ncnn/issues/5902#issuecomment-2655366538

resources

llvm wasm simd header https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/wasm_simd128.h

simd https://github.com/WebAssembly/simd/blob/main/proposals/simd/SIMD.md

relaxed-simd https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md

half-precision https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Overview.md

nihui avatar Feb 12 '25 07:02 nihui

runtime simd and relaxed-simd discovery

#include <stdio.h>
#include <emscripten.h>

EM_JS(int, check_simd_support, (), {
    return WebAssembly.validate(new Uint8Array([0,97,115,109,1,0,0,0,1,5,1,96,0,1,123,3,2,1,0,10,10,1,8,0,65,0,253,15,253,98,11]));
});

EM_JS(int, check_relaxed_simd_support, (), {
    return WebAssembly.validate(new Uint8Array([0,97,115,109,1,0,0,0,1,5,1,96,0,1,123,3,2,1,0,10,15,1,13,0,65,1,253,15,65,2,253,15,253,128,2,11]));
});

int main()
{
    printf("simd = %d\n", check_simd_support());
    printf("relaxed-simd = %d\n", check_relaxed_simd_support());

    return 0;
}
app simd relaxed-simd
linux firefox 1 0
linux chrome 1 1
android browser 1 1
android wechat webview 1 1
android wechat work webview 1 0
macos safari 1 0
ios safari 1 0
ios wechat safari 1 0

nihui avatar Feb 12 '25 09:02 nihui

wasm code coverage

# compile with --coverage -fprofile-arcs -ftest-coverage -g
emcc test.cpp -c -o test.o --coverage -fprofile-arcs -ftest-coverage -g
emcc simplegcov.cpp -c -o simplegcov.o --coverage -fprofile-arcs -ftest-coverage -g

# link with --coverage -g -s FORCE_FILESYSTEM=1 -sEXIT_RUNTIME=1
# link with -lnodefs.js -lnoderawfs.js
emcc test.o simplegcov.o -o test.js -s EXPORTED_FUNCTIONS='["_main"]' --coverage -g -s FORCE_FILESYSTEM=1 -sEXIT_RUNTIME=1 -lnodefs.js -lnoderawfs.js

# run wasm
node test.js

# collect gcda and generate output html
lcov -d ./ -c --gcov-tool `pwd`/llvm-gcov -o lcov.info --ignore-errors inconsistent
genhtml lcov.info -o ./output --ignore-errors inconsistent

https://github.com/Tencent/ncnn/pull/5903

the llvm-gcov wrapper

#!/bin/sh
exec llvm-cov gcov "$@"

nihui avatar Feb 13 '25 03:02 nihui

In fact, even if simd/relaxed-simd discovery is possible, wasm binary distribution is problematic. The wasm runtime requires compilation when loading wasm, which will fully load the basic/simd/relaxed-simd bytecode in wasm, even if only part of it is used during actual runtime, causing compile to fail.

nihui avatar Apr 15 '25 08:04 nihui