Emergency Upgrade - Network stalls.
RC1 Testing:
Upgrade passed But the network stalled after about 20 blocks
Test Execution:
Network: 5 + 2 Name: jiuhong-test-jh Genesis: feat-1.5.6 (9f3995853204a18f17de9c022233d22aa14b9c37) Upgrade: 1.5.6 -> 2.0.0 (d6cf1e7a6b93db178751711394809ac5351ce3ff) Test: Emergency Restart (with validator rotation)
- Shutdown all nodes
- Wait for 1 era
- Start all nodes back up with emergency restart.
- The network should continue.
- Startup nodes removed from new validator list More testing scenario: All the 7 nodes were shutdown. Wait for 135 minutes. Start the new version 2_0_0. Then after around 20 blocks the network stalled. :red_circle: Observations: Upgrade passed But the network stalled after about 20 blocks http://genesis.casperlabs.io/jiuhong-test-jh/casper-node-dumps/jiuhong-test-jh/02052024_0830/dump_download_list.html The log shows {"timestamp":"2024-05-02T14:46:09.817684Z","level":"ERROR","fields":{"message":"distribute block rewards failed due to auction error ValidatorNotFound"},"target":"casper_storage::global_state::state"} {"timestamp":"2024-05-02T14:46:09.817807Z","level":"ERROR","fields":{"message":"failed to execute block","error":"Auction error: Validator not found"},"target":"casper_node::components::contract_runtime::utils"}
cc: @SaiProServ , @Jiuhong-casperlabs
From the scenario (emergency upgrade / twiddling the validator set) and the error (validator not found)) and that it ran for 20 blocks, I'm inferring that it was able to run for an era with the rotated validator set, and then experienced a fatal issue attempting to run the step process.
My guess is: perhaps the seignoirage recipient snapshot is not being properly twiddled by the global state utility. @fizyk20
Housekeeping: Removed the obsolete release-blocker tag and added the correct release blocker tag.