rust icon indicating copy to clipboard operation
rust copied to clipboard

Always check the result of `pthread_mutex_lock`

Open joboet opened this issue 2 years ago • 28 comments

Fixes #120147.

Instead of manually adding a list of "good" platforms, I've simply made the check unconditional. pthread's mutex is already quite slow on most platforms, so one single well-predictable branch shouldn't hurt performance too much.

joboet avatar Jan 22 '24 15:01 joboet

r? @Mark-Simulacrum

(rustbot has picked a reviewer for you, use r? to override)

rustbot avatar Jan 22 '24 15:01 rustbot

Wow, thanks for beating me to this!

cbiffle avatar Jan 22 '24 20:01 cbiffle

r=me with commits squashed

Mark-Simulacrum avatar Jan 27 '24 16:01 Mark-Simulacrum

@bors r+

Mark-Simulacrum avatar Jan 27 '24 17:01 Mark-Simulacrum

:pushpin: Commit 8d708e55eff9026eb379fe409e9ecc0619962640 has been approved by Mark-Simulacrum

It is now in the queue for this repository.

bors avatar Jan 27 '24 17:01 bors

@bors rollup=iffy I suspect this fails here: https://github.com/rust-lang/rust/pull/120441#issuecomment-1913605791

matthiaskrgr avatar Jan 28 '24 13:01 matthiaskrgr

@matthiaskrgr I think that rollup failed because of https://github.com/rust-lang/rust/pull/120232. This PR doesn't modify anything that would cause the errors seen in the rollup.

Urgau avatar Jan 28 '24 14:01 Urgau

Hmm well I have been scratching my head at that failure for a bit, guess we can just leave both prs on iffy, just in case :sweat_smile:

matthiaskrgr avatar Jan 28 '24 16:01 matthiaskrgr

I've seen these issues before (e.g. #103193). They only occur on some targets, while others using the same code work just fine... I suspect that this is a bug in the const-checking logic...

Either way, this PR should be fine (it really does not touch anything related to const). If it keeps on failing, I'll just add the attribute.

joboet avatar Jan 28 '24 17:01 joboet

:hourglass: Testing commit 8d708e55eff9026eb379fe409e9ecc0619962640 with merge 56e88c45eaa0fb8cde65c1f50ee7948416598440...

bors avatar Jan 29 '24 11:01 bors

:boom: Test timed out

bors avatar Jan 29 '24 15:01 bors

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

rust-log-analyzer avatar Jan 29 '24 15:01 rust-log-analyzer

@bors retry macos timeout again

ChrisDenton avatar Jan 29 '24 17:01 ChrisDenton

@ChrisDenton im suspecting that this pr causes a deadlock while bootstrapping somehow, my rollup that contained it failed reproducibly for more than 5 times other prs have passed in the meantime

matthiaskrgr avatar Jan 29 '24 17:01 matthiaskrgr

But it hung on the LLVM stage? It didn't even get to building rust? Built target llvm-reduce was the last message before the timeout.

ChrisDenton avatar Jan 29 '24 17:01 ChrisDenton

Maybe parts of bootstrap use parallelism and this used stuff of the pr? I dunnno... let's try to @bors rollup=never at least

matthiaskrgr avatar Jan 29 '24 17:01 matthiaskrgr

@bors r- Errors are returned directly instead of through errno; the fail code does not work correctly (that cannot have caused the CI hangs, however).

joboet avatar Jan 29 '24 18:01 joboet

I've fixed it now. I really should not have relied on memory here 🤦, it turns out pthread_mutex_lock directly returns error codes.

@rustbot ready

joboet avatar Jan 29 '24 18:01 joboet

@bors r+ rollup=never

Mark-Simulacrum avatar Jan 29 '24 23:01 Mark-Simulacrum

:pushpin: Commit 1df1ebf6add883bcd620c607da40cda3b0fcfc3d has been approved by Mark-Simulacrum

It is now in the queue for this repository.

bors avatar Jan 29 '24 23:01 bors

:hourglass: Testing commit 1df1ebf6add883bcd620c607da40cda3b0fcfc3d with merge 8c65ba0e66bdaffb6f85b4e4e148ade49ddb5e6c...

bors avatar Jan 29 '24 23:01 bors

The job aarch64-gnu failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
[2998/3236] Building CXX object tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/BottleneckAnalysis.cpp.o
[2999/3236] Building CXX object tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/DispatchStatistics.cpp.o
[3000/3236] Building CXX object tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/InstructionInfoView.cpp.o
[3001/3236] Building CXX object tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/InstructionView.cpp.o
##[error]The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
##[group]Clock drift check
  local time: Tue Jan 30 00:02:09 UTC 2024
  network time: Tue, 30 Jan 2024 00:02:09 GMT
##[endgroup]
---
[3048/3236] Building CXX object tools/llvm-pdbutil/CMakeFiles/llvm-pdbutil.dir/MinimalSymbolDumper.cpp.o
[3049/3236] Building CXX object tools/llvm-pdbutil/CMakeFiles/llvm-pdbutil.dir/llvm-pdbutil.cpp.o
[3050/3236] Building CXX object tools/llvm-pdbutil/CMakeFiles/llvm-pdbutil.dir/MinimalTypeDumper.cpp.o
[3051/3236] Linking CXX executable bin/llvm-opt-report
Session terminated, killing shell... ...killed.
##[error]The operation was canceled.

rust-log-analyzer avatar Jan 30 '24 00:01 rust-log-analyzer

:broken_heart: Test failed - checks-actions

bors avatar Jan 30 '24 00:01 bors

So, that's quite strange. Am I reading that correctly -- we're getting a deadlock on aarch64-gnu bootstrapping?

Edit: Is it possible this is a load-dependent timeout firing on a runner, and it should just be reattempted? The log output sure looks like it failed during C++ compilation bits, which I wouldn't expect this change to affect.

cbiffle avatar Feb 01 '24 18:02 cbiffle

On reflection, Linux does not even build the module being changed, so I'm pretty confident this is flake, and would retrigger the build if I knew how / had the permissions (not sure about the latter).

cbiffle avatar Feb 01 '24 20:02 cbiffle

Well, let's see if this works.

@bors retry

cbiffle avatar Feb 02 '24 20:02 cbiffle

@cbiffle: :key: Insufficient privileges: not in try users

bors avatar Feb 02 '24 20:02 bors

Yeah, that's what I figured.

@Mark-Simulacrum I'm pretty sure this is build flake, since Linux doesn't build the code changed in this PR (it uses the futex version) -- do you think it's worth retrying?

cbiffle avatar Feb 02 '24 20:02 cbiffle

@bors r+ rollup=never

Yes, let's try again.

Mark-Simulacrum avatar Feb 04 '24 16:02 Mark-Simulacrum

:bulb: This pull request was already approved, no need to approve it again.

  • This pull request previously failed. You should add more commits to fix the bug, or use retry to trigger a build again.
  • There's another pull request that is currently being tested, blocking this pull request: #120543

bors avatar Feb 04 '24 16:02 bors