Thoughts on memory usage reduction
When submitting the pull request (#203 ), I used my beaglebone black (running Debian OS) to build shecc targeting armv7.
However, I found that the current memory usage is too high to build the stage 2 compiler on my beaglebone black.
debian@BeagleBone:~/shecc$ make
env printf "ARCH=arm" > .session.mk
Target machine code switch to arm
Warning: missing packages: dot jq
Warning: Please check package installation
CC+LD out/inliner
GEN out/libc.inc
CC out/src/main.o
LD out/shecc
SHECC out/shecc-stage1.elf
SHECC out/shecc-stage2.elf
[ 1830.930466] Out of memory: Killed process 1995 (shecc-stage1.el) total-vm:1875560kB, anon-rss:429492kB, file-rss:148kB, shmem-rss:0kB, UID:1000 pgtables:1832kB oom_score_adj:0
make: *** [Makefile:115: out/shecc-stage2.elf] Killed
It would be better to reduce the memory usage so that the build process can complete even on low-memory ARM machines.
The improvement is still a work in progress because I have encountered a troublesome problem.
After enhancing the malloc/free routines, the bootstrap process fails with an unknown-cause segfault when building the stage 2 compiler.
@DrXiao, please see if #226 works for Debian GNU/Linux running beaglebone black.
I tried building shecc on beaglebone black again.
Initially, I switched to commit 8f2b234 to build shecc, and the bootstrapping process failed as expected. However, I noticed that when switching to the latest commit on the master branch (1fb9fa5) , the bootstrapping completed successfully on my board.
Therefore, I used git-bisect to identify the commit that implicitly resolved the memory issue, and I found the bootstrapping started to succeed after commt 4be720b was merged to the master branch.
Next, I will try to analyze the memory usage of 8f2b234, 1fb9fa5 4be720b and #226 .
Next, I will try to analyze the memory usage of 8f2b234, 1fb9fa5 4be720b and #226 .
I used my board to analyze the memory usage during the build of the stage 2 compiler and obtained the following statistics:
For each test, I switched to a specific commit or branch and executed /usr/bin/time -v out/shecc-stage1.elf -o shecc-stage2.elf src/main.c to mesaure execution time and memory usage.
test result - 8f2b234 (failed to build the stage 2 compiler)
Command being timed: "out/shecc-stage1.elf -o shecc-stage2.elf src/main.c"
User time (seconds): 9.12
System time (seconds): 3.26
Percent of CPU this job got: 96%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:12.80
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 434808
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 7
Minor (reclaiming a frame) page faults: 108816
Voluntary context switches: 22
Involuntary context switches: 463
Swaps: 0
File system inputs: 1488
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
test result - 4be720b
Command being timed: "out/shecc-stage1.elf -o shecc-stage2.elf src/main.c"
User time (seconds): 2.51
System time (seconds): 6.37
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.92
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 425540
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 113529
Voluntary context switches: 16
Involuntary context switches: 155
Swaps: 0
File system inputs: 744
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
test result - 1fb9fa5
Command being timed: "out/shecc-stage1.elf -o shecc-stage2.elf src/main.c"
User time (seconds): 2.31
System time (seconds): 3.91
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.23
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 233048
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 65502
Voluntary context switches: 2
Involuntary context switches: 95
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
test result - improve-alloc ( #226 )
Command being timed: "out/shecc-stage1.elf -o shecc-stage2.elf src/main.c"
User time (seconds): 2.55
System time (seconds): 6.33
Percent of CPU this job got: 98%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.98
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 425532
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 113732
Voluntary context switches: 73
Involuntary context switches: 219
Swaps: 0
File system inputs: 1152
File system outputs: 456
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Conclusion
| commit / branch | Max RSS (kbytes) | Build successfully ? |
|---|---|---|
| 8f2b234 | 434808 | No |
| 4be720b | 425540 | Yes |
| 1fb9fa5 | 233048 | Yes |
improve-alloc |
425532 | Yes |
It is unexpected that improve-alloc (#226 ) doesn't explicitly reduce the memory usage, and it seems that a commit between 4be720b and 1fb9fa5 has implicitly contributed to the reduction. I will try to identify the significant commit later.
It is unexpected that
improve-alloc(#226) doesn't explicitly reduce the memory usage, and it seems that a commit between 4be720b94cb1df3c50a4b55845792ccca05f1a63 and 1fb9fa54b5b75c3bda56404d7becaaab2556b13e has implicitly contributed to the reduction. I will try to identify the significant commit later.
I forget to rebase, and now #226 should behave better. You can measure the elapsed time as well.
By the way, you don't have to paste hyperlinks while mentioning issues or commits. Instead, simply write down #226 and 4be720b94cb1df3c50a4b55845792ccca05f1a63 for GitHub to process automatically.
@DrXiao, You can make use of GitHub Action for Continuous Benchmarking, so that we can track the elapsed time and memory usage for each commit as rv32emu does.
test result - #226 (switch to 8adfe48 to build)
Command being timed: "out/shecc-stage1.elf -o shecc-stage2.elf src/main.c"
User time (seconds): 2.40
System time (seconds): 3.98
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.40
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 234800
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 65991
Voluntary context switches: 1
Involuntary context switches: 118
Swaps: 0
File system inputs: 0
File system outputs: 456
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
| commit / pull request | Max RSS (kbytes) | Elapsed time (sec) | Build successfully ? |
|---|---|---|---|
| 8f2b234 | 434808 | 12.08 | No |
| 4be720b | 425540 | 8.92 | Yes |
| 1fb9fa5 | 233048 | 6.23 | Yes |
| #226 | 234800 | 6.40 | Yes |
Although #226 has been updated, the memory usage isn't improved obviously.
Therefore, I used git-bisect again to identify the commit that significantly reduced memory usage, and I found the following commits have reduced the amount of memory used.
7286ab7 -> Max RSS = 326876 kbytes 30f635f -> Max RSS = 233048 kbytes
Then, I noticed that 7286ab7 and 30f635f use the arena allocator to allocate space for several variables, such as macros, symbols and so on. I believe the root cause is that these variables are small objects, but they were previously allocated using malloc(), which allocates relatively large chunks of memory.
@DrXiao, You can make use of GitHub Action for Continuous Benchmarking, so that we can track the elapsed time and memory usage for each commit as rv32emu does.
I've created an issue #236 to track this.
test result - master branch (396a595)
Command being timed: "out/shecc-stage1.elf -o shecc-stage2.elf src/main.c"
User time (seconds): 2.46
System time (seconds): 4.01
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.48
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 237332
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 66655
Voluntary context switches: 1
Involuntary context switches: 135
Swaps: 0
File system inputs: 0
File system outputs: 464
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
test result - #226 (switch to 55e9569 to build)
Command being timed: "out/shecc-stage1.elf -o shecc-stage2.elf src/main.c"
User time (seconds): 2.48
System time (seconds): 4.00
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.50
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 238444
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 66924
Voluntary context switches: 1
Involuntary context switches: 117
Swaps: 0
File system inputs: 0
File system outputs: 464
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
| commit / branch / pull request | Max RSS (kbytes) | Elapsed time (sec) | Build successfully ? |
|---|---|---|---|
| 8f2b234 | 434808 | 12.08 | No |
| 4be720b | 425540 | 8.92 | Yes |
| 1fb9fa5 | 233048 | 6.23 | Yes |
| master (396a595) | 237332 | 6.48 | Yes |
| #226 | 238444 | 6.50 | Yes |
After fixing the type array allocation issue, the updated #226 now successfully complete the build on my beaglebone black. The comparison table has also been updated accordingly.
After fixing the type array allocation issue, the updated #226 now successfully complete the build on my beaglebone black.
I have narrowed the scope of #226 to harden malloc implementation instead of improvements.
Since the memory usage has been reduced by approximately 45% (from 434808 kbytes to 237332 kbytes) due to the previously merged commits, I think this issue can be closed.