Compiled code and handmade WASM interop
Hello all,
After successfully compiling a project, I'd like to optimize it. The program has some x86 asm code which I have ported to wasm by writing the corresponding wast file. So far, so good.
My question is how to interop the compiled project with the handmade .wasm file:
- How to include the .wasm file as part of the linking phase?
- How to call the .wasm function from C? Should it be enough to declare an extern function in C?
- How to access the memory of the allocated structs in C in WASM? I understand all allocations are done with a custom allocator into a single memory chunk. Is there some kind of description on what is actually passed as pointers to a wasm function? In my understanding, it should be just an offset of the actual memory chunk, is that correct? I can assume that
void func(uint8* src, uint*8 dst, int n)On the .wasm side, src and dst arei32offsets of the imported memory? - When passing a shared memory, it is required to know the size beforehand, how to retrieve it to generate the proper
(import 'foo' 'bar' (memory 1 SIZE shared))? What are 'foo and 'bar' here to properly reference the heap?
Thanks for helping me understand how emscripten/llvm work at this level.
If you want to link you wasm assmembly code into an emscripten project then the simplest way to do this would be write in using the LLVM assembly format and including it your project as a .s or .S file. See the assembly files that are part of emscripten for examples of how to do this:
./system/lib/libc/emscripten_memset_bulkmem.S
./system/lib/libc/emscripten_memcpy_bulkmem.S
./system/lib/wasm_worker/wasm_worker_initialize.S
./system/lib/libunwind/src/UnwindRegistersSave.S
./system/lib/libunwind/src/UnwindRegistersRestore.S
./system/lib/pthread/emscripten_thread_state.S
./system/lib/compiler-rt/stack_limits.S
./system/lib/compiler-rt/emscripten_tempret.s
./system/lib/compiler-rt/stack_ops.S
Alternatively, if the project in question has C/C++ fallbacks for the x86 assembly then that would likely be simpler than trying to write hand written wasm assembly.
Thanks @sbc100. Actually the code is written in WAST already which is much easier to code. I don't think writing llvm asm is an option here.
About the fallbacks, yes it does have it but I wanted to optimize it with v128 SIMD ops.
Checking emscripten sources there seems to be some internal logic to pass the heap to the wasm module, like https://github.com/emscripten-core/emscripten/blob/41a730aae6536d1981238d3733e5b7691a0a64f5/src/runtime_shared.js#L64 I guess I can access such memory by importing it from my wasm module and access the shared memory buffer, am I correct?
Disassembling the generated emscripten wasm code, it is not crystal clear how the memory is accessed, but on the .js code seems that's the way to do it.
Am I on the correct path?
Thanks @sbc100. Actually the code is written in WAST already which is much easier to code. I don't think writing llvm asm is an option here.
Are you sure? Can you share the wast file so we can check it out together? I would hope it would be relatively easy to convert from one to the other is most cases.
About the fallbacks, yes it does have it but I wanted to optimize it with v128 SIMD ops.
Another alternative then would be write using the wasm simd C intrinsics, but it sounds like you already write the raw wast so that I likely not attractive to you either.
Checking emscripten sources there seems to be some internal logic to pass the heap to the wasm module, like
emscripten/src/runtime_shared.js
Line 64 in 41a730a
var b = wasmMemory.buffer; I guess I can access such memory by importing it from my wasm module and access the shared memory buffer, am I correct? Disassembling the generated emscripten wasm code, it is not crystal clear how the memory is accessed, but on the .js code seems that's the way to do it.
Am I on the correct path?
It sounds like you are proposing some kind of dynamic linking of two wasm modules, one produced by you direclty and one produced by emscripten. While this may be feasible its certainly not easy and not the simplest way to solve this kind of problem.
By far the simplest way to solve this (which will also lead to better performance) is to build your code as an object file and have emscripten link it into your program statically (i.e. at static link time). However, to produce an object file you really want to write your assembly in the llvm format. As well as being simple this will likely be the most performant option since it will allow wasm-opt to optimize the whole program as one.
Are you sure? Can you share the wast file so we can check it out together? I would hope it would be relatively easy to convert from one to the other is most cases.
Sure, it is not ready yet. Once it is, I will
Another alternative then would be write using the wasm simd C intrinsics, but it sounds like you already write the raw wast so that I likely not attractive to you either.
Isn't an option either. To give you more context, I'm porting https://gitlab.freedesktop.org/gstreamer/orc/ to WASM by providing a WASM target. Orc is basically a loop optimizer using different SIMD instructions (mmx, sse, avx, avx512, neon, etc) and it does so by either generating assembly code to link statically with, or generating the actual machine code for doing JIT execution. Currently, I'm on the assembly approach, which is easier to code, doing WAT. Later, once it works, I'll need to do the actual WASM bytecode. The two approaches provide different challenges. I'm currently trying to understand emscripten/llvm internals to be able to glue Orc there. At the end, on the static approach, I'll need to link to my new WASM (by doing a wat2wasm) and provide a way to pass C variables to it. Maybe from your comments, It will be more feasible to do the JIT directly and provide the glue myself, still the same questions remain as I don't know how to access the heap/pointers and provide them to the WASM module.
It sounds like you are proposing some kind of dynamic linking of two wasm modules, one produced by you direclty and one produced by emscripten. While this may be feasible its certainly not easy and not the simplest way to solve this kind of problem.
Yes, it seems so.
By far the simplest way to solve this (which will also lead to better performance) is to build your code as an object file and have emscripten link it into your program statically (i.e. at static link time). However, to produce an object file you really want to write your assembly in the llvm format. As well as being simple this will likely be the most performant option since it will allow wasm-opt to optimize the whole program as one.
I see, I understand now. I thought that the dynamic linking was against the wasm itself, not the intermediate object. Any other thoughts or source files I can check?
I thought that the dynamic linking was against the wasm itself, not the intermediate object.
I'm afraid I don't quite understand the question. Can you elaborate?
I'm afraid I don't quite understand the question. Can you elaborate?
I apologize, yes. You were referring to the "llvm asm" option as the easiest one, but given that it is not possible, I'm wondering what more complex ways to achieve this are, if any.
The more complex way that it sounds like you are proposing would be to somehow to try to dynamically link wasm module that was not build by emscripten with and emscripten-built module. To do this I think you have two main choices:
- Build you main module with
-sMAIN_MODULE=2and then make your code looks like an emscripten side module (basically just a normal wasm module with a.dylinkmetadata section). - Build statically but then try to implement some kind of dynamic linking in userspace. This sounds like it would result in a lot bespoke and fragile code but its certainly not impossible.
I see, thanks for the information. I'll check https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md as it seems it describes the current ABI to load wasm modules. Thanks!
Answering myself after some findings
How to include the .wasm file as part of the linking phase?
Check https://emscripten.org/docs/compiling/Dynamic-Linking.html#load-time-dynamic-linking simply do a
emcc -sMAIN_MODULE main.c libsomething.wasm
being libsomething.wasm the library already generated. If you want to use emscripten to build it, use -sSIDE_MODULE
How to call the .wasm function from C? Should it be enough to declare an extern function in C?
As long as it is defined in the side module (library) it should be found
I'm having issues with the relocatable feature, doing a wat2wasm with dynamic linking annotations as explained here https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md and calling wasm-objdump it gives me the following information
libtest01.wasm: file format wasm 0x1
Section Details:
Type[2]:
- type[0] (i32) -> nil
- type[1] (i32, i32, i32, i32, i32) -> nil
Import[2]:
- func[0] sig=0 <console.log> <- console.log
- memory[0] pages: initial=1 <- memory.buffer
Function[1]:
- func[1] sig=1 <orc_add2_rshift_sub_s16_11_op>
Export[1]:
- func[1] <orc_add2_rshift_sub_s16_11_op> -> "orc_add2_rshift_sub_s16_11_op"
Code[1]:
- func[1] size=527 <orc_add2_rshift_sub_s16_11_op>
It seems wat2wasm is not honoring the annotation. Maybe a bug?
But doing a wat2wasm with -r gives me this
libtest01.wasm: file format wasm 0x1
Section Details:
Type[2]:
- type[0] (i32) -> nil
- type[1] (i32, i32, i32, i32, i32) -> nil
Import[2]:
- func[0] sig=0 <console.log> <- console.log
- memory[0] pages: initial=1 <- memory.buffer
Function[1]:
- func[1] sig=1 <orc_add2_rshift_sub_s16_11_op>
Export[1]:
- func[1] <orc_add2_rshift_sub_s16_11_op> -> "orc_add2_rshift_sub_s16_11_op"
Code[1]:
- func[1] size=527 <orc_add2_rshift_sub_s16_11_op>
Custom:
- name: "linking"
- symbol table [count=2]
- 0: F <console.log> func=0 [ undefined binding=global vis=default ]
- 1: F <orc_add2_rshift_sub_s16_11_op> func=1 [ exported no_strip binding=local vis=hidden ]
Compiling the main module with the side module gives me this output
emcc -sMAIN_MODULE test01.c libtest01.wasm -o test
error: undefined symbol: orc_add2_rshift_sub_s16_11_op (referenced by root reference (e.g. compiled C/C++ code))
warning: To disable errors for undefined symbols use `-sERROR_ON_UNDEFINED_SYMBOLS=0`
warning: _orc_add2_rshift_sub_s16_11_op may need to be added to EXPORTED_FUNCTIONS if it arrives from a system library
Error: Aborting compilation due to previous errors
emcc: error: '/home/jl/w/github/gst.wasm/build/gst.wasm_web_wasm32/emsdk/node/18.20.3_64bit/bin/node /home/jl/w/github/gst.wasm/build/gst.wasm_web_wasm32/emsdk/upstream/emscripten/src/compiler.mjs /tmp/tmpjnsw022t.json' failed (returned 1)
If I do an emcc -sSIDE_MODULE test02.c with a naive symbol, to confirm the wat2wasm compatibility it gives me
wasm-objdump -x a.out.wasm
a.out.wasm: file format wasm 0x1
Section Details:
Custom:
- name: "dylink.0"
- mem_size : 0
- mem_p2align : 0
- table_size : 0
- table_p2align: 0
Type[2]:
- type[0] () -> nil
- type[1] (i32, i32, i32, i32, i32) -> nil
Import[4]:
- global[0] i32 mutable=1 <- env.__stack_pointer
- global[1] i32 mutable=0 <- env.__memory_base
- global[2] i32 mutable=0 <- env.__table_base
- memory[0] pages: initial=0 <- env.memory
Function[3]:
- func[0] sig=0 <__wasm_call_ctors>
- func[1] sig=0 <__wasm_apply_data_relocs>
- func[2] sig=1 <orc_add2_rshift_sub_s16_11_op>
Export[3]:
- func[0] <__wasm_call_ctors> -> "__wasm_call_ctors"
- func[1] <__wasm_apply_data_relocs> -> "__wasm_apply_data_relocs"
- func[2] <orc_add2_rshift_sub_s16_11_op> -> "orc_add2_rshift_sub_s16_11_op"
Code[3]:
- func[0] size=2 <__wasm_call_ctors>
- func[1] size=2 <__wasm_apply_data_relocs>
- func[2] size=69 <orc_add2_rshift_sub_s16_11_op>
Which is different to what wat2wasm is doing. I'm a bit confused, maybe some version compatibility problem? Compiling the new .wasm side module (the one generated with emcc itself) does work.
wat2wasm -r is designed to be able to produce object files that can then be fed into the static linker. It does not produce emscripten dynamic libraries (side modules). (It is also very limited it what it can do and not well maintained/tested).
I don't know of any way to build and emscripten an dynamic library other than using emscirpten itself (or perhaps using wasm-ld directly).
Even if you did find a way to make a dynamic library from your wat file remember that dynamic linking comes at a cost, especially with wasm/emscripten. There is code size cost and a runtime cost when compared to static linking. Unless you really really need to the code to be loaded dynamically I would not recommend this approach.
wat2wasm -r is designed to be able to produce object files that can then be fed into the static linker. It does not produce emscripten dynamic libraries (side modules). (It is also very limited it what it can do and not well maintained/tested).
And is it possible to feed emcc with an object file (.wasm file) that is generated from wat2wasm -r?
On my tests, I get
error: undefined symbol: orc_add2_rshift_sub_s16_11_op (referenced by root reference (e.g. compiled C/C++ code))
warning: To disable errors for undefined symbols use `-sERROR_ON_UNDEFINED_SYMBOLS=0`
warning: _orc_add2_rshift_sub_s16_11_op may need to be added to EXPORTED_FUNCTIONS if it arrives from a system library
Error: Aborting compilation due to previous errors
which seems to be exported from the custom "linking" section
Seems that my situation is similar to https://github.com/WebAssembly/wabt/issues/1658
wat2wasm -r is designed to be able to produce object files that can then be fed into the static linker. It does not produce emscripten dynamic libraries (side modules). (It is also very limited it what it can do and not well maintained/tested).
And is it possible to feed emcc with an object file (.wasm file) that is generated from
wat2wasm -r? On my tests, I geterror: undefined symbol: orc_add2_rshift_sub_s16_11_op (referenced by root reference (e.g. compiled C/C++ code)) warning: To disable errors for undefined symbols use `-sERROR_ON_UNDEFINED_SYMBOLS=0` warning: _orc_add2_rshift_sub_s16_11_op may need to be added to EXPORTED_FUNCTIONS if it arrives from a system library Error: Aborting compilation due to previous errorswhich seems to be exported from the custom "linking" section
Seems that my situation is similar to WebAssembly/wabt#1658
It should work, but it would not be surprising to me if wat2wasm -r has bit rotted. Its not well maintained or tested. I'm tempted to simply remove the -r feature, unless somebody (perhaps you?) whats to volunteer to maintain it.
It should work, but it would not be surprising to me if wat2wasm -r has bit rotted. Its not well maintained or tested. I'm tempted to simply remove the -r feature, unless somebody (perhaps you?) whats to volunteer to maintain it.
To be honest, I don't know where to start. Is it some emscripten wrong behavior with the "linking" custom section, or wat2wasm not following https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md ...
Can you share the object produced by wat2wasm -r (the one that should be defining orc_add2_rshift_sub_s16_11_op).. I can probably tell you what is wrong it it.
As we go down the rabbit hole though I would once again advice you to write your code in llvm assembly format to avoid this issue.
In addition to being easily convertible to a valid object file, the LLVM assembly format also has some added advantages over wat such as support the C pre-processor and supporting symbolic names for your static data.
Can you share the object produced by wat2wasm -r (the one that should be defining orc_add2_rshift_sub_s16_11_op).. I can probably tell you what is wrong it it.
I think I've found the issue but can't explain if it is a correct behavior or not. Basically,
For a code like
(func (export "orc_add2_rshift_sub_s16_11_op") (param $d1 i32) (param $s1 i32) (param $s2 i32) (param $s3 i32) (param $n i32)
wat2wasm -r generates the following "linking" table
- 1: F <orc_add2_rshift_sub_s16_11_op> func=1 [ exported no_strip binding=local vis=hidden ]
But for the following code (without export)
(func $orc_add2_rshift_sub_s16_11_op (param $d1 i32) (param $s1 i32) (param $s2 i32) (param $s3 i32) (param $n i32)
The generated object file has
- 1: F <orc_add2_rshift_sub_s16_11_op> func=1 [ binding=global vis=default ]
The difference is on the binding and vis by just using the export statement. With the second form, it links correctly with Emscripten.
As we go down the rabbit hole though I would once again advice you to write your code in llvm assembly format to avoid this issue. In addition to being easily convertible to a valid object file, the LLVM assembly format also has some added advantages over wat such as support the C pre-processor and supporting symbolic names for your static data.
Yes, and I appreciate your patience and help with this topic. As my particular requires building a "compiler" myself, I'd like to understand further the alternatives and how things work
Just wanted to update this issue to avoid dangling issues on the system. An initial implementation can be found at https://gitlab.freedesktop.org/gstreamer/orc/-/merge_requests/234 in case someone needs some reference in the future.
Adding the corresponding "linking" custom section and following the visibility hints wabtdoes, the code can be safely linked by emscripten and run properly.
So it looks like you ended up writing your own wasm object file writer? (Is that right? I don't see any .wat content in your PR) Very cool! I thought you were looking for a general purpose wat => object converter, which I was advising against, but producing wasm object file using your own pipeline seems reasonable to me. Nice work!
BTW that "spec" for the object file format is here: https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md.
Indeed, it is a .wasm and .wat generator. I was looking for a proper text -> binary converter because it eases the development, and it is always easier to code in wat :)
Thanks for your help @sbc100