Build stalls on massive autogenerated C files
I've been experimenting with building various things with emscripten to relatively decent success, until recently.
The library in question is libpostal. After making a few changes (namely to have it use the clang webassembly SSE intrinsics, increasing the max wasm memory, etc), it hangs on building the enormous scanner.c - this file is autogenerated from re2c, but despite being enormous, non-emscripten builds take on the order of a few dozen seconds. Emscripten's build, however, steadily climbs in memory usage until its killed from OOM. I've thrown 200GB of memory at it and ran it overnight and it still never finished, eventually still dying to an OOM.
I can get the complexity of the file is quite high, but this still seems rather unusual, and I'm having trouble tracking down potential causes - I feel it may be a bug in the toolchain.
What pat of the process is failing? Is it compiling or linking?
If it is linking (seems likely), what part of the link is failing? (You add -v to you link command to get more information about the sub-processes or build with EMCC_DEBUG=1 to get even more info).
It's the compiling process (emcc is what is stalling) - everything works up to that enormous .c file. It's relied on by several pieces of libpostal, so it doesn't even get a chance to run the linker.
Here's the full command that's being run (as part of the makefile), if it helps:
/bin/bash ../libtool --tag=CC --mode=compile /home/.../LP/emsdk/upstream/emscripten/emcc -DHAVE_CONFIG_H -I.. -I/usr/local/include -Wall -Wextra -Wno-unused-function -Wformat -Werror=format-security -Winit-self -Wno-sign-compare -DLIBPOSTAL_DATA_DIR='"/usr/local/share/libpostal"' -g -msimd128 -sALLOW_MEMORY_GROWTH -sMAXIMUM_MEMORY=4gb -DUSE_SIMD -g -O2 -O0 -D LIBPOSTAL_EXPORTS -MT libscanner_la-scanner.lo -MD -MP -MF .deps/libscanner_la-scanner.Tpo -c -o libscanner_la-scanner.lo `test -f 'scanner.c' || echo './'`scanner.c
In that case this is likely a clang issue since emcc will simply exec clang when compiling, and nothing more.
Can you confirm that it is the clang process that is stalling and eating all your memory? Can you add -v to your cflags to see the exact clang command that is run?
Does compiling the same C file for a different --target (or not target at all for your host system) not have the same issue?
Since the file in question has a huge amount of goto statements I imagine the LLVM pass that is taking all the resources is https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp.
Could you perhaps attach the pre-processed scanner.c (so that we can build it standalone with all the headers, etc).
Since the file in question has a huge amount of
gotostatements I imagine the LLVM pass that is taking all the resources is https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp.
re2c does have some options to reduce the number of gotos, I believe; may be worth me experimenting regenerating it w/ some of those options to see how it behaves. The parser.c I have is the same as the one in libpostal above - I'll see about getting my changes put into a repo here in a bit (they're relatively minimal, at the moment, isolated to the autoconf scripts.)
I can also see about pulling it out to try to isolate it for testing.
Can you add -v to your cflags to see the exact clang command that is run?
make[2]: Entering directory '/home/.../LP/libpostal-wasm/src'
cd .. && /bin/bash /home/.../LP/libpostal-wasm/missing automake-1.16 --foreign src/Makefile
cd .. && /bin/bash ./config.status src/Makefile depfiles
config.status: creating src/Makefile
config.status: executing depfiles commands
/bin/bash ../libtool --tag=CC --mode=compile /home/.../LP/emsdk/upstream/emscripten/emcc -DHAVE_CONFIG_H -I.. -I/usr/local/include -v -Wall -Wextra -Wno-unused-function -Wformat -Werror=format-security -Winit-self -Wno-sign-compare -DLIBPOSTAL_DATA_DIR='"/usr/local/share/libpostal"' -g -msimd128 -sALLOW_MEMORY_GROWTH -sMAXIMUM_MEMORY=4gb -DUSE_SIMD -g -O2 -O0 -D LIBPOSTAL_EXPORTS -MT libscanner_la-scanner.lo -MD -MP -MF .deps/libscanner_la-scanner.Tpo -c -o libscanner_la-scanner.lo `test -f 'scanner.c' || echo './'`scanner.c
libtool: compile: /home/.../LP/emsdk/upstream/emscripten/emcc -DHAVE_CONFIG_H -I.. -I/usr/local/include -v -Wall -Wextra -Wno-unused-function -Wformat -Werror=format-security -Winit-self -Wno-sign-compare -DLIBPOSTAL_DATA_DIR=\"/usr/local/share/libpostal\" -g -msimd128 -sALLOW_MEMORY_GROWTH -sMAXIMUM_MEMORY=4gb -DUSE_SIMD -g -O2 -O0 -D LIBPOSTAL_EXPORTS -MT libscanner_la-scanner.lo -MD -MP -MF .deps/libscanner_la-scanner.Tpo -c scanner.c -fPIC -DPIC -o .libs/libscanner_la-scanner.o
emcc: warning: linker setting ignored during compilation: 'ALLOW_MEMORY_GROWTH' [-Wunused-command-line-argument]
emcc: warning: linker setting ignored during compilation: 'MAXIMUM_MEMORY' [-Wunused-command-line-argument]
/home/.../LP/emsdk/upstream/bin/clang -target wasm32-unknown-emscripten -fignore-exceptions -fvisibility=default -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --sysroot=/home/.../LP/emsdk/upstream/emscripten/cache/sysroot -DEMSCRIPTEN -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -DHAVE_CONFIG_H -I.. -I/usr/local/include -v -Wall -Wextra -Wno-unused-function -Wformat -Werror=format-security -Winit-self -Wno-sign-compare -DLIBPOSTAL_DATA_DIR="/usr/local/share/libpostal" -g3 -msimd128 -DUSE_SIMD -g3 -O2 -O0 -D LIBPOSTAL_EXPORTS -MT libscanner_la-scanner.lo -MD -MP -MF .deps/libscanner_la-scanner.Tpo -c scanner.c -fPIC -DPIC -o.libs/libscanner_la-scanner.o
clang version 21.0.0git (https:/github.com/llvm/llvm-project 9534d27e3321a3b9e6e79fe6328445575bf26b7b)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /home/.../LP/emsdk/upstream/bin
(in-process)
"/home/.../LP/emsdk/upstream/bin/clang-21" -cc1 -triple wasm32-unknown-emscripten -emit-obj -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name scanner.c -mrelocation-model pic -pic-level 2 -mframe-pointer=none -ffp-contract=on -fno-rounding-math -mconstructor-aliases -target-feature +mutable-globals -target-cpu generic -target-feature +simd128 -debug-info-kind=constructor -dwarf-version=4 -debugger-tuning=gdb -fdebug-compilation-dir=/home/.../LP/libpostal-wasm/src -v -fcoverage-compilation-dir=/home/.../LP/libpostal-wasm/src -resource-dir /home/.../LP/emsdk/upstream/lib/clang/21 -dependency-file .deps/libscanner_la-scanner.Tpo -MT libscanner_la-scanner.lo -sys-header-deps -MP -D EMSCRIPTEN -D HAVE_CONFIG_H -I .. -I /usr/local/include -D "LIBPOSTAL_DATA_DIR=\"/usr/local/share/libpostal\"" -D USE_SIMD -D LIBPOSTAL_EXPORTS -D PIC -isysroot /home/.../LP/emsdk/upstream/emscripten/cache/sysroot -internal-isystem /home/.../LP/emsdk/upstream/lib/clang/21/include -internal-isystem /home/.../LP/emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten -internal-isystem /home/.../LP/emsdk/upstream/emscripten/cache/sysroot/include -O0 -Wall -Wextra -Wno-unused-function -Wformat -Werror=format-security -Winit-self -Wno-sign-compare -ferror-limit 19 -fvisibility=default -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fignore-exceptions -fcolor-diagnostics -iwithsysroot/include/fakesdl -iwithsysroot/include/compat -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -o .libs/libscanner_la-scanner.o -x c scanner.c
clang -cc1 version 21.0.0git based upon LLVM 21.0.0git default target x86_64-unknown-linux-gnu
ignoring nonexistent directory "/home/.../LP/emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten"
#include "..." search starts here:
#include <...> search starts here:
..
/usr/local/include
/home/.../LP/emsdk/upstream/emscripten/cache/sysroot/include/fakesdl
/home/.../LP/emsdk/upstream/emscripten/cache/sysroot/include/compat
/home/.../LP/emsdk/upstream/lib/clang/21/include
/home/.../LP/emsdk/upstream/emscripten/cache/sysroot/include
Could you perhaps attach the pre-processed
scanner.c(so that we can build it standalone with all the headers, etc).
To reproduce, pull my libpostal fork and run the following:
emconfigure ./bootstrap.sh
emconfigure ./configure --datadir=/tmp/libpostal-data
emmake make -j8
I'm still going to work on splitting out the parser, but in the mean time you can use this to get far enough into the build to where it gets unhappy.
I'm also interested in building libpostal with emscripten, and I'm having exactly the same problem. I'm using the following versions:
- emscripten 4.0.17, installed through homebrew
- libpostal 1.1.4
- running on macOS Tahoe 26.0.1, on an M1 Macbook Pro
I ran:
emconfigure ./bootstrap.sh
emconfigure ./configure --datadir=/opt/homebrew/share --disable-data-download
emmake make
After the build failed, I was able to isolate the failing line, run it individually (with the -v flag added), and reproduce the failure. The reproducible failure is here:
/opt/homebrew/Cellar/emscripten/4.0.17/libexec/emcc -DHAVE_CONFIG_H -I.. -I/usr/local/include -Wall -Wextra -Wno-unused-function -Wformat -Werror=format-security -Winit-self -Wno-sign-compare -DLIBPOSTAL_DATA_DIR=\"/opt/homebrew/share/libpostal\" -g -g -O2 -O0 -D LIBPOSTAL_EXPORTS -MT libscanner_la-scanner.lo -MD -MP -MF .deps/libscanner_la-scanner.Tpo -c scanner.c -fno-common -DPIC -o .libs/libscanner_la-scanner.o -v
/opt/homebrew/Cellar/emscripten/4.0.17/libexec/llvm/bin/clang -target wasm32-unknown-emscripten -fignore-exceptions -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --sysroot=/opt/homebrew/Cellar/emscripten/4.0.17/libexec/cache/sysroot -DEMSCRIPTEN -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -DHAVE_CONFIG_H -I.. -I/usr/local/include -Wall -Wextra -Wno-unused-function -Wformat -Werror=format-security -Winit-self -Wno-sign-compare '-DLIBPOSTAL_DATA_DIR="/opt/homebrew/share/libpostal"' -g3 -g3 -O2 -O0 -D LIBPOSTAL_EXPORTS -MT libscanner_la-scanner.lo -MD -MP -MF .deps/libscanner_la-scanner.Tpo -c scanner.c -fno-common -DPIC -o.libs/libscanner_la-scanner.o -v
clang version 22.0.0git
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /opt/homebrew/Cellar/emscripten/4.0.17/libexec/llvm/bin
(in-process)
"/opt/homebrew/Cellar/emscripten/4.0.17/libexec/llvm/bin/clang-22" -cc1 -triple wasm32-unknown-emscripten -O0 -emit-obj -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name scanner.c -mrelocation-model static -mframe-pointer=none -ffp-contract=on -fno-rounding-math -mconstructor-aliases -target-cpu generic -fvisibility=hidden -debug-info-kind=constructor -dwarf-version=4 -debugger-tuning=gdb -fdebug-compilation-dir=/Users/singingwolfboy/clones/libpostal/src -target-linker-version 1221.4 -v -fcoverage-compilation-dir=/Users/singingwolfboy/clones/libpostal/src -resource-dir /opt/homebrew/Cellar/emscripten/4.0.17/libexec/llvm/lib/clang/22 -dependency-file .deps/libscanner_la-scanner.Tpo -MT libscanner_la-scanner.lo -sys-header-deps -MP -D EMSCRIPTEN -D HAVE_CONFIG_H -I .. -I /usr/local/include -D "LIBPOSTAL_DATA_DIR=\"/opt/homebrew/share/libpostal\"" -D LIBPOSTAL_EXPORTS -D PIC -isysroot /opt/homebrew/Cellar/emscripten/4.0.17/libexec/cache/sysroot -internal-isystem /opt/homebrew/Cellar/emscripten/4.0.17/libexec/llvm/lib/clang/22/include -internal-isystem /opt/homebrew/Cellar/emscripten/4.0.17/libexec/cache/sysroot/include/wasm32-emscripten -internal-isystem /opt/homebrew/Cellar/emscripten/4.0.17/libexec/cache/sysroot/include -Wall -Wextra -Wno-unused-function -Wformat -Werror=format-security -Winit-self -Wno-sign-compare -ferror-limit 19 -fmessage-length=80 -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fignore-exceptions -fcolor-diagnostics -iwithsysroot/include/fakesdl -iwithsysroot/include/compat -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -o .libs/libscanner_la-scanner.o -x c scanner.c
clang -cc1 version 22.0.0git based upon LLVM 22.0.0git default target arm64-apple-darwin25.0.0
ignoring nonexistent directory "/usr/local/include"
ignoring nonexistent directory "/opt/homebrew/Cellar/emscripten/4.0.17/libexec/cache/sysroot/include/wasm32-emscripten"
#include "..." search starts here:
#include <...> search starts here:
..
/opt/homebrew/Cellar/emscripten/4.0.17/libexec/cache/sysroot/include/fakesdl
/opt/homebrew/Cellar/emscripten/4.0.17/libexec/cache/sysroot/include/compat
/opt/homebrew/Cellar/emscripten/4.0.17/libexec/llvm/lib/clang/22/include
/opt/homebrew/Cellar/emscripten/4.0.17/libexec/cache/sysroot/include
End of search list.
clang(14301,0x204af0800) malloc: Failed to allocate segment from range group - out of space
Let me know if there's any way I can help move this forward towards a resolution!
If it helps, the massive scanner.c file can be downloaded from the openvenues/libpostal repo. It's almost 6 MB of text!
Do you know how much memory clang is trying to use here? i.e. can you try compiling that file on a machine with more RAM and seeing what the peek usage of the clang process is? Is this number much bigger for Wasm compared to other clang targets? This can probably only be considered a Wasm/emscripten issue if clang is using a lot more memory for this target than for others.
I'm not a very experienced C programmer, so maybe I'm doing the wrong thing here. But I will say that I was able to make clang compile this file easily when I switched it to the arm64 target. For context, here's the command that hangs forever, due to the malloc issue:
/opt/homebrew/Cellar/emscripten/4.0.17/libexec/llvm/bin/clang -target wasm32-unknown-emscripten -fignore-exceptions -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --sysroot=/opt/homebrew/Cellar/emscripten/4.0.17/libexec/cache/sysroot -DEMSCRIPTEN -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -DHAVE_CONFIG_H -I.. -I/usr/local/include -Wall -Wextra -Wno-unused-function -Wformat -Werror=format-security -Winit-self -Wno-sign-compare '-DLIBPOSTAL_DATA_DIR="/opt/homebrew/share/libpostal"' -g3 -g3 -O2 -O0 -D LIBPOSTAL_EXPORTS -MT libscanner_la-scanner.lo -MD -MP -MF .deps/libscanner_la-scanner.Tpo -c scanner.c -fno-common -DPIC -o.libs/libscanner_la-scanner.o -v
And here's almost the same command that successfully compiles for the arm64 target:
/opt/homebrew/Cellar/emscripten/4.0.17/libexec/llvm/bin/clang -target arm64 -fignore-exceptions -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --sysroot=/opt/homebrew/Cellar/emscripten/4.0.17/libexec/cache/sysroot -DEMSCRIPTEN -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -DHAVE_CONFIG_H -I.. -I/usr/local/include -Wall -Wextra -Wno-unused-function -Wformat -Werror=format-security -Winit-self -Wno-sign-compare '-DLIBPOSTAL_DATA_DIR="/opt/homebrew/share/libpostal"' -g3 -g3 -O2 -O0 -D LIBPOSTAL_EXPORTS -MT libscanner_la-scanner.lo -MD -MP -MF .deps/libscanner_la-scanner.Tpo -c scanner.c -fno-common -DPIC -o.libs/libscanner_la-scanner.o -v -I/opt/homebrew/Cellar/emscripten/4.0.17/libexec/cache/sysroot/include
There are only two differences between those two commands. First, I replaced -target wasm32-unknown-emscripten with -target arm64. Then, when I got an error saying 'stdio.h' file not found, I added -I/opt/homebrew/Cellar/emscripten/4.0.17/libexec/cache/sysroot/include to the end of the command; I think that was the correct thing to do to resolve the error, but I'm not 100% sure. Either way, that second command is able to successfully compile scanner.c on my system, and it only takes a second or two to do so!
I don't know how to measure the memory usage for this, but I can confidently say that it's a lot less, considering that the -target wasm32-unknown-emscripten version consistently runs out of memory after about a minute, while the -target arm64 version consistently succeeds without any memory issues after a second or two.
Ok, this does looks like some kind of pathological case in the llvm backend for WebAssemlby.
Can you open an issue upstream in llvm-project about this? Please attache the full pre-processed source to that issue, along with the command you used to trigger the pathological behavior.