Fix undefined behavior on simultaneous write in daemon state file
Simultaneous daemon state file writes currently are not handled. This PR aims to handle such cases. So there are 2 scenarios when simultaneous write can happen:
- Another program that uses daemon state in the process of writing in this file
- Another daemon of this program tries to write into a file
How this PR tries to fix it: For the 1.
- We lock a file with
file-lock, which ensures another program won't be able to write in this file (We don't lock read-only daemonstates)
For the 2.
- We create global state of locked files by this program with
once_cell::sync::Lazy. - We disallow other daemons from any thread to access this lock, unless it's cloned directly from the daemon, who originally locked it.
- As soon as every daemon dropped state we unlock file
Beyond safety this PR adds few optimizations:
- Write json to a file only when all of the state clones done with using this state file
- Read from a shared state instead of file, when it's in a locked state
Deploying cw-orchestrator with
Cloudflare Pages
| Latest commit: |
7eae16a
|
| Status: | ⚡️ Build in progress... |
It also can corrupt state, in such cases:
{
"juno": {
"juno-1": {
"code_ids": {},
"test": {
"test0": "test-0", } "te}
:
"test-3"
}
}
}
}
Codecov Report
Attention: Patch coverage is 80.15267% with 52 lines in your changes missing coverage. Please review.
Project coverage is 56.6%. Comparing base (
9c43514) to head (a795bbe).
Additional details and impacted files
| Files | Coverage Δ | |
|---|---|---|
| cw-orch-daemon/src/channel.rs | 83.7% <100.0%> (+0.7%) |
:arrow_up: |
| cw-orch-daemon/src/error.rs | 57.1% <ø> (ø) |
|
| cw-orch-daemon/src/queriers/cosmwasm.rs | 51.5% <100.0%> (ø) |
|
| cw-orch-daemon/src/sync/core.rs | 78.1% <100.0%> (+3.1%) |
:arrow_up: |
| cw-orch-daemon/src/tx_builder.rs | 71.2% <100.0%> (ø) |
|
| packages/clone-testing/src/state.rs | 100.0% <100.0%> (ø) |
|
| cw-orch-daemon/src/sync/builder.rs | 88.6% <63.6%> (-5.6%) |
:arrow_down: |
| cw-orch-daemon/src/json_lock.rs | 92.1% <92.1%> (ø) |
|
| cw-orch-daemon/src/builder.rs | 69.4% <78.5%> (+1.6%) |
:arrow_up: |
| cw-orch-daemon/src/core.rs | 58.6% <33.3%> (+0.9%) |
:arrow_up: |
| ... and 3 more |
This is particularly crucial when using daemons that interact with different chains !
This is particularly crucial when using daemons that interact with different chains !
This is also needed when interacting with 1 chain
It also can corrupt state, in such cases
Update: seems like it was because of mid-way aborted thread.
@Kayanski wrote a quick proof of concept implementation, that don't break any existing test as well as succeeds the new multi-threaded ones
I'm a bit confused by the global Mutex. It seems like you're using a locking library (lock-file) but then added your own locking solution on top. Why did you do that? Could you explain your implementation (preferably in the description of the PR)?
I'm a bit confused by the global
Mutex. It seems like you're using a locking library (lock-file) but then added your own locking solution on top. Why did you do that? Could you explain your implementation (preferably in the description of the PR)?
Updated PR description! This PR originally had only failing tests with missleading error "EOF while parsing a value", which you can find in cw-orch-daemon/tests/daemon_state.rs, to discuss how we want to handle it.
See clippy err
So if I understand this PR correctly the JSON state file will be locked when the
Daemonis constructed? Any other threads that attempt to construct aDaemonwill be blocked until this original Daemon is dropped?
- File locked after first daemon points to the file
- When using state daemon locks json and unlocks after use
- Other threads of daemons will be blocked for write/read until previous finished his action
- As soon as all of the daemons dropped "pointer" to state file we write result to file
- If daemon tries to use state file that's already locked by another process it will panic, to prevent hang (see tests)
So if I understand this PR correctly the JSON state file will be locked when the
Daemonis constructed? Any other threads that attempt to construct aDaemonwill be blocked until this original Daemon is dropped?
File locked after first daemon points to the file
When using state daemon locks json and unlocks after use
- Other threads of daemons will be blocked for write/read until previous finished his action
As soon as all of the daemons dropped "pointer" to state file we write result to file
If daemon tries to use state file that's already locked by another process it will panic, to prevent hang (see tests)
Referring to our discussion on Discord. Do we all agree that the second daemon that attempts to access the file should throw an err instead of waiting for the lock to drop?
So if I understand this PR correctly the JSON state file will be locked when the
Daemonis constructed? Any other threads that attempt to construct aDaemonwill be blocked until this original Daemon is dropped?
File locked after first daemon points to the file
When using state daemon locks json and unlocks after use
- Other threads of daemons will be blocked for write/read until previous finished his action
As soon as all of the daemons dropped "pointer" to state file we write result to file
If daemon tries to use state file that's already locked by another process it will panic, to prevent hang (see tests)
Referring to our discussion on Discord. Do we all agree that the second daemon that attempts to access the file should throw an err instead of waiting for the lock to drop?
But other Daemons can still use this state file if only it's clone of original daemon, right?
So if I understand this PR correctly the JSON state file will be locked when the
Daemonis constructed? Any other threads that attempt to construct aDaemonwill be blocked until this original Daemon is dropped?
File locked after first daemon points to the file
When using state daemon locks json and unlocks after use
- Other threads of daemons will be blocked for write/read until previous finished his action
As soon as all of the daemons dropped "pointer" to state file we write result to file
If daemon tries to use state file that's already locked by another process it will panic, to prevent hang (see tests)
Referring to our discussion on Discord. Do we all agree that the second daemon that attempts to access the file should throw an err instead of waiting for the lock to drop?
But other Daemons can still use this state file if only it's clone of original daemon, right?
Yes! So the ownership check should be performed on daemon(state) construction. Clones shouldn't check.
@Kayanski @CyberHoward On poisoned Mutex state should we allow saving state in case user called manually force_write? Alternatively we can add parameter ignore_poison to give more control
@Kayanski @CyberHoward On poisoned Mutex state should we allow saving state in case user called manually
force_write? Alternatively we can add parameterignore_poisonto give more control
Let's fail if there's a poison, we can develop the feature if someone asks later