snarkOS icon indicating copy to clipboard operation
snarkOS copied to clipboard

[Bug] Prover tries to communicate with the operator even if the connection failed, unable to reconnect

Open alextop opened this issue 4 years ago • 2 comments

🐛 Bug Report

I observe 2 issues right now.

  1. After the connection challenge exchange failed and the connection was not established, the prover still tries to send the messages. Probably it should ignore the disconnected peer until the next successful connection attempt. The log is below, last message is repeated constantly:
Feb 15 23:35:10 root1822 snarkos[3851472]: 2022-02-15T22:35:10.478778Z DEBUG Connecting to 65.21.24.215:4132...
Feb 15 23:35:10 root1822 snarkos[3851472]: 2022-02-15T22:35:10.499509Z TRACE Sending 'ChallengeRequest-A' to 65.21.24.215:4132
Feb 15 23:35:10 root1822 snarkos[3851472]: 2022-02-15T22:35:10.520252Z TRACE Received 'ChallengeRequest-B' from 65.21.24.215:4132
Feb 15 23:35:10 root1822 snarkos[3851472]: 2022-02-15T22:35:10.520265Z TRACE Sending 'ChallengeResponse-B' to 65.21.24.215:4132
Feb 15 23:35:10 root1822 snarkos[3851472]: 2022-02-15T22:35:10.520595Z TRACE Failed to get challenge response from 65.21.24.215:4132, peer has disconnected
Feb 15 23:35:11 root1822 snarkos[3851472]: 2022-02-15T22:35:11.438345Z  WARN Attempted to send to a non-connected peer 65.21.24.215:4132
Feb 15 23:35:12 root1822 snarkos[3851472]: 2022-02-15T22:35:12.439630Z  WARN Attempted to send to a non-connected peer 65.21.24.215:4132
Feb 15 23:35:13 root1822 snarkos[3851472]: 2022-02-15T22:35:13.440561Z  WARN Attempted to send to a non-connected peer 65.21.24.215:4132
  1. The prover can not reconnect to the operator, the same challenge response failure is observed, see the (pre-filtered) log below.
Feb 15 23:38:54 root1822 snarkos[3851472]: 2022-02-15T22:38:54.108284Z DEBUG Connecting to 65.21.24.215:4132...
Feb 15 23:38:54 root1822 snarkos[3851472]: 2022-02-15T22:38:54.128997Z TRACE Sending 'ChallengeRequest-A' to 65.21.24.215:4132
Feb 15 23:38:54 root1822 snarkos[3851472]: 2022-02-15T22:38:54.149802Z TRACE Received 'ChallengeRequest-B' from 65.21.24.215:4132
Feb 15 23:38:54 root1822 snarkos[3851472]: 2022-02-15T22:38:54.149815Z TRACE Sending 'ChallengeResponse-B' to 65.21.24.215:4132
Feb 15 23:38:54 root1822 snarkos[3851472]: 2022-02-15T22:38:54.149864Z TRACE Failed to get challenge response from 65.21.24.215:4132, peer has disconnected
Feb 15 23:38:54 root1822 snarkos[3851472]: 2022-02-15T22:38:54.269559Z DEBUG Status Report (type = Prover, status = Ready, block_height = 308874, cumulative_weight = 119134067178, block_requests = 0, connected_peers = 16)
Feb 15 23:38:54 root1822 snarkos[3851472]: 2022-02-15T22:38:54.359022Z  WARN Attempted to send to a non-connected peer 65.21.24.215:4132
.......
Feb 15 23:39:24 root1822 snarkos[3851472]: 2022-02-15T22:39:24.150909Z TRACE Skipping connection request to 65.21.24.215:4132 (tried 30 secs ago)
........
Feb 15 23:40:01 root1822 snarkos[3851472]: 2022-02-15T22:40:01.264002Z  WARN Attempted to send to a non-connected peer 65.21.24.215:4132
Feb 15 23:40:01 root1822 snarkos[3851472]: 2022-02-15T22:40:01.264008Z TRACE Skipping connection request to 65.21.24.215:4132 (tried 67 secs ago)
Feb 15 23:40:01 root1822 snarkos[3851472]: 2022-02-15T22:40:01.264013Z  WARN Attempted to send to a non-connected peer 65.21.24.215:4132
.......
Feb 15 23:42:05 root1822 snarkos[3851472]: 2022-02-15T22:42:05.054706Z TRACE Skipping connection request to 65.21.24.215:4132 (tried 190 secs ago)
.......
Feb 15 23:42:35 root1822 snarkos[3851472]: 2022-02-15T22:42:35.975320Z  WARN Attempted to send to a non-connected peer 65.21.24.215:4132
Feb 15 23:42:35 root1822 snarkos[3851472]: 2022-02-15T22:42:35.975324Z DEBUG Connecting to 65.21.24.215:4132...
Feb 15 23:42:35 root1822 snarkos[3851472]: 2022-02-15T22:42:35.975373Z  WARN Attempted to send to a non-connected peer 65.21.24.215:4132
Feb 15 23:42:35 root1822 snarkos[3851472]: 2022-02-15T22:42:35.996093Z TRACE Sending 'ChallengeRequest-A' to 65.21.24.215:4132
Feb 15 23:42:36 root1822 snarkos[3851472]: 2022-02-15T22:42:36.016826Z TRACE Received 'ChallengeRequest-B' from 65.21.24.215:4132
Feb 15 23:42:36 root1822 snarkos[3851472]: 2022-02-15T22:42:36.016836Z TRACE Sending 'ChallengeResponse-B' to 65.21.24.215:4132
Feb 15 23:42:36 root1822 snarkos[3851472]: 2022-02-15T22:42:36.016874Z TRACE Failed to get challenge response from 65.21.24.215:4132, peer has disconnected
Feb 15 23:42:39 root1822 snarkos[3851472]: 2022-02-15T22:42:39.909971Z DEBUG Status Report (type = Prover, status = Ready, block_height = 308884, cumulative_weight = 119140993855, block_requests = 0, connected_peers = 16)
......
Feb 15 23:43:10 root1822 snarkos[3851472]: 2022-02-15T22:43:10.282010Z  WARN Attempted to send to a non-connected peer 65.21.24.215:4132
Feb 15 23:43:10 root1822 snarkos[3851472]: 2022-02-15T22:43:10.282134Z TRACE Skipping connection request to 65.21.24.215:4132 (tried 34 secs ago)
Feb 15 23:43:10 root1822 snarkos[3851472]: 2022-02-15T22:43:10.282227Z  WARN Attempted to send to a non-connected peer 65.21.24.215:4132
......

Steps to Reproduce

This probably depends on the network status but timestamps in the logs may help if operator logs for the same time period are examined.

Network layer state is ok, ping log is below:

PING 65.21.24.215 (65.21.24.215) 56(84) bytes of data.
64 bytes from 65.21.24.215: icmp_seq=1 ttl=59 time=20.7 ms
64 bytes from 65.21.24.215: icmp_seq=2 ttl=59 time=20.6 ms
64 bytes from 65.21.24.215: icmp_seq=3 ttl=59 time=20.6 ms
64 bytes from 65.21.24.215: icmp_seq=4 ttl=59 time=20.6 ms

Expected Behavior

  1. The first issue can be fixed if the prover will handle the connection status properly and will not attempt to send messages if the connection was not established or disconnected.
  2. The prover should successfully reconnect after a certain delay (seems it's 180 sec for now). The second case looks like some application logic issue happens.

Your Environment

  • snarkos 2.0.2
  • rustc 1.58.1 (db9d1b20b 2022-01-20)
  • Ubuntu 20.04
  • AMD Ryzen 3900x, 64GB RAM, 2*1TB NVMe Raid 0
  • 1GBPs network

alextop avatar Feb 15 '22 22:02 alextop

You might want to check the operator's log to see why there is no challenge response from the operator.

HarukaMa avatar Feb 16 '22 00:02 HarukaMa

Same node: Disconnected from operator

2022-02-09T15:56:15.340275Z DEBUG Connecting to 118.193.103.172:4132...
2022-02-09T15:56:15.340319Z TRACE Skipping connection request to 138.201.20.51:4132 (tried 117 secs
2022-02-09T15:56:15.340362Z TRACE Skipping connection request to 135.181.16.243:4132 (tried 45 secs
2022-02-09T18:00:33.738836Z TRACE Sending 'PeerRequest' to 35.228.82.203:4132
2022-02-09T18:00:33.738861Z TRACE Skipping connection request to 159.223.124.150:4132 (tried 27 secs
2022-02-09T18:00:33.738866Z TRACE Skipping connection request to 161.35.106.91:4132 (tried 108 secs
2022-02-10T03:39:09.719595Z TRACE Attempting connection to 137.184.202.162:4132...
2022-02-10T03:39:09.719597Z TRACE Attempting connection to 159.223.124.150:4132...
2022-02-10T03:39:09.719599Z TRACE Attempting connection to 143.198.166.150:4132...
2022-02-10T04:50:02.079182Z  INFO aleo_prover::prover: Share found for block 282696 with weight 31
2022-02-10T04:50:04.540246Z  INFO aleo_prover::prover: Share found for block 282696 with weight 1163
2022-02-10T04:50:04.567121Z  INFO aleo_prover::prover: Share found for block 282696 with weight 3185
2022-02-16T06:30:01.609898Z TRACE Attempting connection to 143.198.166.150:4132...
2022-02-16T06:30:01.609900Z TRACE Attempting connection to 144.126.219.193:4132...
2022-02-16T06:30:01.609902Z TRACE Attempting connection to 137.184.192.155:4132...
2022-02-16T06:30:01.609904Z TRACE Attempting connection to 157.245.133.62:4132...
2022-02-16T06:30:01.609907Z TRACE Attempting connection to 188.166.7.13:4132...
2022-02-16T06:30:01.609909Z TRACE Attempting connection to 165.232.145.194:4132...
2022-02-16T06:30:01.609912Z TRACE Attempting connection to 159.223.118.35:4132...
2022-02-16T06:30:01.609914Z TRACE Attempting connection to 159.223.124.150:4132...
2022-02-16T06:30:01.609916Z TRACE Attempting connection to 137.184.202.162:4132...
PING 65.21.24.215 (65.21.24.215) 56(84) bytes of data.
64 bytes from 65.21.24.215: icmp_seq=1 ttl=46 time=246 ms
64 bytes from 65.21.24.215: icmp_seq=2 ttl=46 time=232 ms
64 bytes from 65.21.24.215: icmp_seq=3 ttl=46 time=229 ms
64 bytes from 65.21.24.215: icmp_seq=4 ttl=46 time=227 ms
64 bytes from 65.21.24.215: icmp_seq=5 ttl=46 time=233 ms
64 bytes from 65.21.24.215: icmp_seq=6 ttl=46 time=239 ms
64 bytes from 65.21.24.215: icmp_seq=7 ttl=46 time=229 ms

Work991 avatar Feb 16 '22 07:02 Work991

Not applicable to testnet3.

ljedrz avatar Sep 02 '22 13:09 ljedrz