bevy icon indicating copy to clipboard operation
bevy copied to clipboard

bevy_gizmos doc tests hang compute when run locally

Open alice-i-cecile opened this issue 1 year ago • 5 comments

Bevy version

https://github.com/bevyengine/bevy/commit/b24ab2e9fba5b81aba15da97b6551a4e99aad40c

[Optional] Relevant system information

Windows

What you did

Run cargo run -p ci --doc.

What went wrong

Doc tests for initial crates pass, and then a dozen or so doc tests within bevy_gizmos get stuck, producing a warning that they've been running for more than 60 seconds.

My computer then runs out of RAM (over 12 GB!), freezes, and I have to hard reboot.

Additional information

Running with fewer threads via e.g. -j 4 improves responsiveness and slows down time-to-hang, but does not resolve the problem.

This works fine in CI, so it may be OS specific.

alice-i-cecile avatar Feb 29 '24 18:02 alice-i-cecile

This issue is related to 'Microsoft® Incremental Linker' - the linker used by the rust compiler on some (most?) windows machines. The linkers in the following image use 18.8 GB of RAM! gizmos-doc-test-task-manager

The one thing that sets bevy_gizmos apart from other crates like bevy_ecs is the increased "usage" of 'Microsoft® Incremental Linker'. Most other doc-tests also use 'Microsoft® Incremental Linker' but it does not use alot of memory there. Instead 'link.exe' uses more memory but remains within acceptable bounds (<1GB). In addition, bevy_gizmos uses bevy_ecs::system::assert_is_system(system); alot in doc-tests. However, since other crates like bevy_ecs make excessive use of this assertion aswell and do not have this issue, I am not sure this is the cause of the problem.

No single doc-test in particular seems to be responsible for the increased memory usage, as removing any given set of doc-tests results in a decrease in memory usage more or less proportional to the amount of tests removed. Instead, running doc-tests for (very) large projects on windows machines seems to be the issue.

I was unable to find an "in code" fix for this, but there is a discussion on the rust forums that appears to suggest building for a different but compatible target or using rust-lld as the linker.

However, this problem does not seem to be limited to Windows only, as #11034 details a very similar (the same?) issue on a Linux machine.

lynn-lumen avatar Apr 17 '24 23:04 lynn-lumen

Further discussion on Discord.

For some reason --jobs 1 doesn't seem to actually limit the amount of tests being compiled, but -- --test-threads=1 does. It also seems that the linker is invoked for each test, and for bevy_gizmos tests it takes almost 2GB per invocation. By default cargo runs 16 of them (one per thread), making me quickly run out of RAM. I wonder why it doesn't happen with the earlier tests though

General conclusions seem to be the same as above, although it looks like https://github.com/rust-lang/rust/pull/123974 should resolve this.

alice-i-cecile avatar May 28 '24 12:05 alice-i-cecile

In addition to the previous observation, rustdoc doesn't respect the linker option in Cargo's config.toml, you need to explicitly set it in the rustdocflags:

[target.x86_64-pc-windows-msvc]
rustdocflags = ["-Clinker=rust-lld.exe"]

This helps reduce the memory usage to 1/2 or 1/3, which is a huge improvement if it leads to avoiding swapping (depends on how much RAM your system has).

SkiFire13 avatar May 28 '24 12:05 SkiFire13

https://github.com/rust-lang/rust/pull/126245 has been merged, though unfortunately it will be enabled only with the 2024 edition.

SkiFire13 avatar Aug 14 '24 11:08 SkiFire13

Same issue on Fedora 40. Running cargo ci creates many instances of ld which consume all available memory and cause a system crash. Setting rustdocflags to use lld fixed the problem. The nightly release also runs successfully since it uses rust-lld.

Rust version info

rustc 1.80.1 (3f5fd8dd4 2024-08-06)
binary: rustc
commit-hash: 3f5fd8dd41153bc5fdca9427e9e05be2c767ba23
commit-date: 2024-08-06
host: x86_64-unknown-linux-gnu
release: 1.80.1
LLVM version: 18.1.7

config.toml

[target.x86_64-unknown-linux-gnu]
linker = "clang"
rustflags = ["-Clink-arg=-fuse-ld=lld"]
rustdocflags = ["-Clink-arg=-fuse-ld=lld"]

roblesch avatar Aug 20 '24 21:08 roblesch

12th Gen Intel(R) Core(TM) i7-12700H

cargo test --doc

test result: ok. 45 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 17.96s

With cargo run -p ci -- doc there is an entirely different story. A whole bunch of crates pass and then...

   Doc-tests bevy_gizmos

running 45 tests
test crates\bevy_gizmos\src\arcs.rs - arcs::Gizmos<'w,'s,Config,Clear>::arc_2d (line 34) has been running for over 60 seconds
test crates\bevy_gizmos\src\arcs.rs - arcs::Gizmos<'w,'s,Config,Clear>::arc_3d (line 154) has been running for over 60 seconds
test crates\bevy_gizmos\src\arcs.rs - arcs::Gizmos<'w,'s,Config,Clear>::long_arc_2d_between (line 386) has been running for over 60 seconds
test crates\bevy_gizmos\src\arcs.rs - arcs::Gizmos<'w,'s,Config,Clear>::long_arc_3d_between (line 253) has been running for over 60 seconds
test crates\bevy_gizmos\src\arcs.rs - arcs::Gizmos<'w,'s,Config,Clear>::short_arc_2d_between (line 340) has been running for over 60 seconds
test crates\bevy_gizmos\src\arcs.rs - arcs::Gizmos<'w,'s,Config,Clear>::short_arc_3d_between (line 207) has been running for over 60 seconds
test crates\bevy_gizmos\src\arrows.rs - arrows::ArrowBuilder<'_,'_,'_,Config,Clear>::with_tip_length (line 37) has been running for over 60 seconds
test crates\bevy_gizmos\src\arrows.rs - arrows::Gizmos<'w,'s,Config,Clear>::arrow (line 114) has been running for over 60 seconds
test crates\bevy_gizmos\src\arrows.rs - arrows::Gizmos<'w,'s,Config,Clear>::arrow_2d (line 145) has been running for over 60 seconds
test crates\bevy_gizmos\src\arrows.rs - arrows::Gizmos<'w,'s,Config,Clear>::axes (line 175) has been running for over 60 seconds
test crates\bevy_gizmos\src\arrows.rs - arrows::Gizmos<'w,'s,Config,Clear>::axes_2d (line 208) has been running for over 60 seconds
test crates\bevy_gizmos\src\circles.rs - circles::Gizmos<'w,'s,Config,Clear>::circle (line 116) has been running for over 60 seconds
test crates\bevy_gizmos\src\circles.rs - circles::Gizmos<'w,'s,Config,Clear>::circle_2d (line 157) has been running for over 60 seconds
test crates\bevy_gizmos\src\circles.rs - circles::Gizmos<'w,'s,Config,Clear>::ellipse (line 36) has been running for over 60 seconds
test crates\bevy_gizmos\src\circles.rs - circles::Gizmos<'w,'s,Config,Clear>::ellipse_2d (line 77) has been running for over 60 seconds
test crates\bevy_gizmos\src\circles.rs - circles::Gizmos<'w,'s,Config,Clear>::sphere (line 199) has been running for over 60 seconds
test crates\bevy_gizmos\src\cross.rs - cross::Gizmos<'_,'_,Config>::cross (line 24) has been running for over 60 seconds
test crates\bevy_gizmos\src\cross.rs - cross::Gizmos<'_,'_,Config>::cross_2d (line 59) has been running for over 60 seconds
test crates\bevy_gizmos\src\curves.rs - curves::Gizmos<'w,'s,Config,Clear>::curve_2d (line 30) has been running for over 60 seconds
test crates\bevy_gizmos\src\curves.rs - curves::Gizmos<'w,'s,Config,Clear>::curve_3d (line 64) has been running for over 60 seconds

It uses all 32 GB of RAM and causes the system to freeze up/soft crash after about 5-10 minutes.

BenjaminBrienen avatar Oct 28 '24 22:10 BenjaminBrienen

This appears to be fixed in latest nightly. It's not fixed for me in stable or beta. Which is odd as the fix is supposed to be in stable and I can't see anything in the beta/nightly release notes.

With the fix my ci -- doc-test time goes from 354s to 101s 🚀

One tidbit I noticed while testing - bevy_ecs is an exception and falls back to compiling the tests separately. Probably not a big deal as it only costs 20s or so. But if anyone fancies investigating then take a look at rust/src/librustdoc/doctest.rs/run_test.

Nightly: 1.87.0-nightly (2f581937e 2025-02-28) Beta: 1.86.0-beta.3 (42212a5c4 2025-02-27) Stable: 1.85.0 (4d91de4e4 2025-02-17)

greeble-dev avatar Feb 28 '25 18:02 greeble-dev

https://github.com/rust-lang/rust/issues/137898

https://users.rust-lang.org/t/question-about-doctest-speedup-in-rust-2024-edition/126431

🤦 🤦 🤦

notriddle avatar Mar 02 '25 21:03 notriddle

in 19ee692f9621f89f305096f423507e925b748b9a, with RUST_TEST_THREADS=4 to prevent OOM, so slower than usual

Command Time
cargo test --doc -p bevy_gizmos 15.32s
cargo test --doc -p bevy_gizmos --all-features 66.70s
cargo test --doc -p bevy_gizmos --release 3.08s
cargo test --doc -p bevy_gizmos --release --all-features 3.57s
cargo test --doc -p bevy_gizmos
bevy_ecs::system::assert_is_system commented out
1.78s
cargo test --doc -p bevy_gizmos --all-features
bevy_ecs::system::assert_is_system commented out
3.08s

hukasu avatar Mar 07 '25 15:03 hukasu

one thing to note is that --jobs is for compiling, to limit testing you use --test-threads

i.e. cargo test --doc --jobs 4 -- --test-threads 4 --test-threads is passed to the harness so it needs to be after --

these envvars also work

CARGO_BUILD_JOBS=4;
RUST_TEST_THREADS=4;

hukasu avatar Mar 07 '25 15:03 hukasu

Appears fixed in Rust 1.85.1: https://releases.rs/docs/1.85.1/. Doc tests for bevy_gizmos now take just a second or two and I couldn't see any big memory spikes.

I suspect that the bevy_ecs doc tests are failing to merge for some reason. This causes a slowdown - maybe 20 seconds or so - but not a big memory spike.

greeble-dev avatar Mar 19 '25 18:03 greeble-dev

Marking as fixed!

alice-i-cecile avatar Mar 19 '25 20:03 alice-i-cecile