emscripten icon indicating copy to clipboard operation
emscripten copied to clipboard

"local count too large" with large generated C++ function in debug mode

Open skelly-energid opened this issue 2 years ago • 21 comments

Please include the following in your bug report:

Version of emscripten/emsdk: emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.34 (57b21b8fdcbe3ebb523178b79465254668eab408) clang version 17.0.0 (https://github.com/llvm/llvm-project a031f72187ce495b9faa4ccf99b1e901a3872f4b) Target: wasm32-unknown-emscripten Thread model: posix

Using this python script to generate the code:

#!/usr/bin/env python3

import os
import sys

code = ""

for i in range(0, 1200):
    code += f"""case {i}:
            otherfunc(arg2, "foo", local2, local1, local3, arg1, arg3, arg4, arg5, arg6, arg7, arg8);
            break;
    """

with open("generated.cpp", "w") as f:
    f.write(
f"""

void otherfunc (...) {{
}}

void somefunc(int arg1, int arg2, char arg3, int arg4, int arg5, int arg6, int arg7, int arg8) {{
  char local1 = 0;
  char local2 = 0;
  char local3 = 0;

  switch (arg4) {{
    {code}
    default:
      break;
  }}
}}

int main(int, char**)
{{
    somefunc(0, 0, 's', 0, 0, 0, 0, 0);
    return 0;
}}

""")

and this command line:

../generate.py && em++   -g -o main.html generated.cpp

Chrome fails with something like

main.js:870 Uncaught (in promise) RuntimeError: Aborted(CompileError: WebAssembly.instantiate(): Compiling function #2 failed: local count too large @+452)

Firefox fails with something like

Uncaught (in promise) RuntimeError: Aborted(CompileError: wasm validation error: at offset 455: too many locals)

I don't really know what is meant by locals here, because there are not many local variables.

skelly-energid avatar May 12 '23 11:05 skelly-energid

Does the problem go away when you run with -O1 or -O2?

Can you try also with -sERROR_ON_WASM_CHANGES_AFTER_LINK -sWASM_BIGINT? There are two possible causes here, firstly llvm could be generating a function with that many locals, secondly one of the binaryen optimizations could be causing it. If it still fails with -sERROR_ON_WASM_CHANGES_AFTER_LINK -sWASM_BIGINT then we would know it was llvm that was at fault. I'm not sure what we can do about that though.. it might be simplest to suggest that you break up your huge function.

sbc100 avatar May 12 '23 18:05 sbc100

Thanks, yes, it is only reproducible in debug mode. If I add -O1 I can't reproduce it anymore. I can still reproduce it without -O1 and with -sERROR_ON_WASM_CHANGES_AFTER_LINK -sWASM_BIGINT.

I've already broken up the generated function, and that has solved the problem for me. As you say though, this could still be a llvm bug.

skelly-energid avatar May 12 '23 20:05 skelly-energid

I got the same error when I enabled Asan via build parameter -fsanitize=address. What works is -fsanitize=undefined, though.

mike-lischke avatar Jul 21 '23 09:07 mike-lischke

@mike-lischke, does adding -sERROR_ON_WASM_CHANGES_AFTER_LINK and -sWASM_BIGINT fix the issue for you?

sbc100 avatar Jul 24 '23 09:07 sbc100

I'm on travel with limited internet connection. Will check when I return in 2 weeks.

mike-lischke avatar Jul 24 '23 19:07 mike-lischke

@sbc100 I have now tested your suggestion and get this error:

em++: error: changes to the wasm are required after link, but disallowed by ERROR_ON_WASM_CHANGES_AFTER_LINK: ['/opt/homebrew/Cellar/emscripten/3.1.43/libexec/binaryen/bin/wasm-emscripten-finalize', '--output-source-map-url=antlr4-runtime-wasm.wasm.map', '-g', '--bigint', '--no-dyncalls', '--no-legalize-javascript-ffi', '--dwarf', 'wasm/antlr4-runtime-wasm.wasm', '-o', 'wasm/antlr4-runtime-wasm.wasm']

Unclear why a change to the wasm file would be necessary. The full command line is:

em++ \ wasm/antlr4-runtime-wrapper.cpp \ antlr4-cpp-runtime/*.cpp\ antlr4-cpp-runtime/atn/*.cpp\ antlr4-cpp-runtime/dfa/*.cpp\ antlr4-cpp-runtime/internal/*.cpp\ antlr4-cpp-runtime/misc/*.cpp\ antlr4-cpp-runtime/support/*.cpp\ antlr4-cpp-runtime/tree/*.cpp\ antlr4-cpp-runtime/tree/pattern/*.cpp\ antlr4-cpp-runtime/tree/xpath/*.cpp\ -O0\ -std=c++17 -lembind -Iantlr4-cpp-runtime/ \ -fwasm-exceptions\ -o wasm/antlr4-runtime-wasm.js\ -s EXPORT_ES6=1\ -s WASM=1\ -s WASM_BIGINT=1\ -s ALLOW_MEMORY_GROWTH=1\ -s DEMANGLE_SUPPORT=1\ -g\ -gsource-map\ -s ERROR_ON_WASM_CHANGES_AFTER_LINK=1\ -fsanitize=address

mike-lischke avatar Aug 09 '23 12:08 mike-lischke

Can you try without -gsource-map?

sbc100 avatar Aug 09 '23 16:08 sbc100

Also without source maps the same error comes up (failed to asynchronously prepare wasm: CompileError: WebAssembly.instantiate(): Compiling function #77:"embind_init_main()" failed: local count too large @+51266).

mike-lischke avatar Aug 10 '23 10:08 mike-lischke

And this was with -sERROR_ON_WASM_CHANGES_AFTER_LINK? i.e. did you manage to link with that flag?

What other flags are you using? I know -fsanitize=address can cause a lot of extra locals to get added. We have to disable several tests under asan because of the local count too large" issue. Do you see the this error even without -fsanitize=address?

sbc100 avatar Aug 10 '23 16:08 sbc100

Yes, I could link without -gsource-map. The "local count too large" comes up when I run the test script. These are the flags I use now:

    -O0\
    -std=c++17 -lembind -Iantlr4-cpp-runtime/ \
    -o wasm/antlr4-runtime-wasm.js\
    -s EXPORT_ES6=1\
    -s WASM=1\
    -s WASM_BIGINT=1\
    -s ALLOW_MEMORY_GROWTH=1\
    -s DEMANGLE_SUPPORT=1\
    -s ERROR_ON_WASM_CHANGES_AFTER_LINK=1\
    -fwasm-exceptions\
    -fsanitize=address\
    -g

Using -fsanitize=undefined instead makes the app running, but then I cannot debug that memory alignment issue I have.

mike-lischke avatar Aug 11 '23 09:08 mike-lischke

I see, so this seems like an issue with -fsanitize=address resulting in too many locals being created. This will most likely require an upstream llvm change, and I'm not quite sure that it would look like.

BTW, do you have any information about the function in question (the one with too many locals): embind_init_main? Is it a particularly long or complex function?

One possible workaround here would be to split this function up at the source level into several smaller functions. In the case of EMSCRIPTEN_BINDINGS (which it looks like this is) you could create several blocks such as :

EMSCRIPTEN_BINDINGS(main1, {
...
}

EMSCRIPTEN_BINDINGS(main2) {
...
}
etc.

sbc100 avatar Aug 11 '23 15:08 sbc100

Somehow I anticipated this question :-) But no, this is not from my code. I searched my entire project and could not find it, not even in generated code. How much are "to many locals"? However, since I want to make all public classes available in JS/TS, I have bound many classes already. If the EMSCRIPTEN_BINDINGS is the culprit I can certainly structure that differently.

What are those names in that macro call actually? The documentation doesn't really explain them. Do they matter in some way?

Update: That was it! I split the main bindings (around 2000 lines, 53 classes) into 6 separated binding functions and now I can execute the code and get an error report:

SUMMARY: AddressSanitizer: heap-use-after-free (file:///Volumes/Extern/Work/projects/antlr4wasm/wasm/antlr4-runtime-wasm.wasm+0x84f1f1) in antlr4::atn::LexerATNSimulator::match(antlr4::CharStream*, unsigned long)+0x84f1f1

That's great! Many thanks for your help Sam!

mike-lischke avatar Aug 11 '23 15:08 mike-lischke

The name passed as the first argument to EMSCRIPTEN_BINDINGS doesn't really mean anything.. it just gives the generated function a unique name. So you can use whatever name you want as long as don't repeat names (since that will result in multiple symbols with the same name.

sbc100 avatar Aug 11 '23 15:08 sbc100

@sbc100 I have the same problem when compiling a huge 300k lines SWIG-generated C++ file. In my case the problem is very real - it happens in a function which registers all the exported classes which has 768 blocks of this type:

do {
 // register class and methods
} while(0);

Every block contains 4 local variables. When I use -O2 the compiler must optimize some (a large part?) of those variables and everything works. Otherwise I get the local count too large. Isn't it possible to configure this limit somewhere?

mmomtchev avatar Oct 11 '23 11:10 mmomtchev

I believe the local count too large error is coming from the wasm runtime (e.g. v8) and I don't think it configurable, no.

We could perhaps have binaryen (wasm-opt) and/or llvm try to limit the number of locals they allow themselves to use. It might be easier on the binaryen side but I'm not sure. @tlively @kripken WDTY?

In the short term I'm afraid that your best best is probably to attempt to break up your large function into smaller ones.

sbc100 avatar Oct 11 '23 16:10 sbc100

Binaryen does do this when it optimizes. I think that's why as reported above this works with -O2. It's just a problem in unoptimized builds.

As a workaround, you can do some manual optimization of the wasm like this: wasm-opt --simplify-locals --coalesce-locals a.wasm -o a.wasm, which in most cases should fix this (an entire run of -O would be even better, but much slower).

kripken avatar Oct 11 '23 19:10 kripken

@kripken Alas, wasm-opt produces a binary that segfaults on instantiateStreaming. I had to also add --enable-bulk-memory as directed.

Finally, building just that one source file with -O1 while the rest is with -O0 produces a working binary, so this is acceptable to me for debugging.

mmomtchev avatar Oct 14 '23 09:10 mmomtchev

I notice that llvm geps where the offset is a constant produce a local in the wat which is unneeded and optimised away with O1 . It would seem possible to eliminate that.

yowl avatar Jan 24 '24 02:01 yowl

I'm getting the same issue, the difference is that, if I enable your suggested flag @sbc100 ERROR_ON_WASM_CHANGES_AFTER_LINK I get the error: em++: error: changes to the wasm are required after link, but disallowed by ERROR_ON_WASM_CHANGES_AFTER_LINK: ['emsdk/upstream/bin/wasm-emscripten-finalize', '-g', '--bigint', '--no-legalize-javascript-ffi', '--check-stack-overflow', '../../bin/release/out.wasm', '-o', '../../bin/release/out.wasm']

Also, I notice I get this error when addind the ASYNCIFY option

kohakukun avatar Jun 20 '24 08:06 kohakukun

@kohakukun can you remove share the original link command that caused the failure? Along with the failure message itself? Does it happen if you add -O1?

sbc100 avatar Jun 20 '24 16:06 sbc100

I also have the problem with the ASYNCIFY flag

MikhailGorobets avatar Jul 09 '24 09:07 MikhailGorobets

I met same issue with ASYNCIFY too.

xbcnn avatar Aug 19 '24 09:08 xbcnn

I'm having the same issue after compiling including FFmpeg for debug. If I exclude the decoders, it works, but that is far from optimal.

	emcc ./lib/web-demuxer/*.c ./lib/web-demuxer/*.cpp \
		-lembind \
		-I./lib/FFmpeg \
		-L./lib/FFmpeg/libavformat -lavformat \
		-L./lib/FFmpeg/libavutil -lavutil \
		-L./lib/FFmpeg/libavcodec -lavcodec \
		--post-js ./lib/web-demuxer/post.js \
		-lworkerfs.js \
		-s EXPORT_ES6=1 \
		-s INVOKE_RUN=0 \
		-s ENVIRONMENT=worker \
		-s ASYNCIFY \
		-s ALLOW_MEMORY_GROWTH=1 \
	        -O0 \
	        -g \
	        --minify 0 \
	        --profiling \
	        -sWASM_BIGINT \
	        --closure-args=--debug

bartadaniel avatar Oct 25 '24 12:10 bartadaniel

Does the workaround of manually optimizing the wasm not work for anyone?

I see a comment about a "segfault on instantiateStreaming" in that case, but that sounds like a separate browser issue. If we can get a testcase for it, we can file a bug on the relevant browser.

Otherwise, there isn't anything more we can do: reducing the amount of locals is something optimizations do, so this browser limit on locals can be hit in debug builds, unfortunately. Always running wasm-opt even in debug builds is a possibility to avoid this, but it would make all debug builds slower, so I think that wouldn't be the best option. Instead, running wasm-opt manually on the rare cases this happens seems best.

kripken avatar Oct 25 '24 15:10 kripken

@kripken I successfully executed with -O2 or -O3. I got a warning Some VMs may not accept this binary because it has a large number of locals in function when I compiled the program that includes the relatively large function "without optimization" (i.e. -O0) and got the same error local found too large when I executed the generated wasm. So it works with optimization at least in my case (because optimization pass removed the many local variables?).

yomaytk avatar Nov 11 '24 08:11 yomaytk

I can compile with -O2 or -O3 but that makes run-time source-level debugging using the DWARF info wonky. I assume this is the expected behavior with all the heavy optimizations going on. It would be helpful to have an option to disable these hard limits in the browser during development to avoid the issue. I'll raise the question to the Chromium team.

edit: link to the chromium issue

bartadaniel avatar Dec 20 '24 10:12 bartadaniel