tinygo icon indicating copy to clipboard operation
tinygo copied to clipboard

Guidance for wasm target vectorization?

Open jeremy-coleman opened this issue 4 years ago • 6 comments

Hi. I have a few questions / request for guidance regarding wasm target (for web).

Does the tinygo llvm pass try to do autovectorization on go code?

Does the tinygo clang pass on c imports do autovec like emscripten? Generally, Im unclear on the roles of clang vs emscripten. I know emcc shims coreutils/sdl/etc and does some code size related stuff. I also know clang does the initial autovectorization , but is clang targeting something like sse4 then emcc translates that to wasm simd ? Or can clang autovec to a wasm target (and thus tinygo too)?

Does the tinygo llvm ir and any c/clang llvm ir get merged together before codegen happens? Or just kind of linked together? I am a complete novice here.(please dont spend much effort answering this, i have a feeling an answer could potentially be nearly infinitely complex)

Do the clang/llvm compiler args affect both the c and go output or is there individual configs for each? I have read the docs (several times) , but i am still unclear on how it all fits together.

i guess the tldr is, if i want to write code that can be autovectorized into a wasm module, is tinygo with either go or c a good fit?

jeremy-coleman avatar Oct 21 '21 02:10 jeremy-coleman

TinyGo doesn't do any autovectorization. LLVM might do it, but only when the appropriate extensions are enabled.

Generally, Im unclear on the roles of clang vs emscripten. I know emcc shims coreutils/sdl/etc and does some code size related stuff.

Emscriptem is a whole compiler toolchain, that includes Clang, wasm-opt (to optimize the resulting wasm), and lots of shims to translate some common APIs like SDL to web equivalents. So Clang is just a component of that.

To be clear:

  • LLVM is a compiler toolkit that provides all kinds of things useful for compilers (here is where autovectorization happens)
  • clang is a compiler (that uses LLVM): it converts C/C++ etc into object code
  • emscriptem is a cross compilation toolchain: it bundles clang and a load of other things together to make compiling for the web easy
  • TinyGo also uses LLVM (and sometimes Clang for C code) and also ships with some libraries, but those are of course very different as we're talking about Go here instead of C.

also know clang does the initial autovectorization , but is clang targeting something like sse4 then emcc translates that to wasm simd ? Or can clang autovec to a wasm target (and thus tinygo too)?

Probably not SSE4, which is a x86 thing (not wasm). I'd guess that if the proper extension is enabled (+simd128) it will try to autovectorize some things.

Can you give a bit more background? Is there something you'd like to do but that's too slow at the moment?

aykevl avatar Oct 21 '21 12:10 aykevl

Hey, thanks for your response. Im doing web graphics stuff atm, so perf:user experience are directly related. With web graphics being often cpu bound, simd could be a big win. I guess just kind of a generally always need moooarrr. Btw, I mentioned sse4 specifically just because i think i remember v8 has/had checks up to sse4, it maybe here somewhere . https://source.chromium.org/chromium/chromium/src/+/main:v8/src/wasm/function-body-decoder-impl.h.

I know assemblyscript just enables the builtin simd opcodes when you enable simd (without any vectorizing). Since I have no idea how llvm works, i currently imagine it is equivalent to teaching acorn/babel new ast types, but not necessarily transforming anything. It will be really cool if tinygo can vectorize go code with the simd128 flag.

jeremy-coleman avatar Oct 21 '21 15:10 jeremy-coleman

You could try to add the flag -llvm-features=+simd128. I haven't tested it, but it might work. In general, TinyGo wasn't optimized for speed. So you will likely find many bottlenecks. If you aren't already, you should use -opt=2, which is like -O2 in GCC/Clang (the default is -opt=z, which is like -Os in GCC or -Oz in Clang). It can sometimes make a big difference.

Btw, I mentioned sse4 specifically just because i think i remember v8 has/had checks up to sse4

Maybe they use SSE4 for something, but SSE is x86 only. It doesn't exit on ARM, MIPS, or WebAssembly. Only x86. However, it is very likely that they will convert WebAssembly SIMD instructions to SSE instructions.

aykevl avatar Oct 21 '21 20:10 aykevl

PS recent TinyGo is already using >1.0 (a.k.a. MVP) features, so optimizing for this in wasm could be useful. Notably SIMD is a part of the draft WebAssembly 2.0 Core spec. https://webassembly.github.io/spec/core/appendix/changes.html

codefromthecrypt avatar Sep 07 '22 08:09 codefromthecrypt

For anyone looking how to optimize TinyGo binaries, the tinygo.org website will have a page for this with the next release: https://github.com/tinygo-org/tinygo-site/pull/287

aykevl avatar Sep 15 '22 14:09 aykevl

BTW I've confirmed that -llvm-features=+simd128 does generate some v128 wasm instructions.

rockwotj avatar Jan 26 '24 02:01 rockwotj