Node keeps loosing peers.
Start a new warp sync using latest polkadot+substrate master with an empty db dir in GCP. The nodes keeps loosing almost all peers every couple of minutes.
a@a-db:~/src/polkadot$ ./target/release/polkadot -d ~/db3 --sync=warp -l db=trace,sync=trace 2>warp.log
2022-09-08 12:45:31.522 INFO main sc_cli::runner: Parity Polkadot
2022-09-08 12:45:31.522 INFO main sc_cli::runner: ✌️ version 0.9.28-b2ff4c05ab3
2022-09-08 12:45:31.522 INFO main sc_cli::runner: ❤️ by Parity Technologies <[email protected]>, 2017-2022
2022-09-08 12:45:31.522 INFO main sc_cli::runner: 📋 Chain specification: Polkadot
2022-09-08 12:45:31.522 INFO main sc_cli::runner: 🏷 Node name: certain-bean-5721
2022-09-08 12:45:31.522 INFO main sc_cli::runner: 👤 Role: FULL
2022-09-08 12:45:31.522 INFO main sc_cli::runner: 💾 Database: RocksDb at /home/arkadiy/db3/chains/polkadot/db/full
2022-09-08 12:45:31.522 INFO main sc_cli::runner: ⛓ Native runtime: polkadot-9280 (parity-polkadot-0.tx13.au0)
2022-09-08 12:45:35.022 INFO main sc_service::client::client: 🔨 Initializing Genesis block/state (state: 0x29d0…4e17, header-hash: 0x91b1…90c3)
2022-09-08 12:45:35.040 INFO main afg: 👴 Loading GRANDPA authority set from genesis on what appears to be first startup.
2022-09-08 12:45:35.618 INFO main babe: 👶 Creating empty BABE epoch changes on what appears to be first startup.
2022-09-08 12:45:35.620 INFO main sub-libp2p: 🏷 Local node identity is: 12D3KooWL9R2Y3DuFdAueryD5nn6cL6BZs4RkMimC9cYkxyHE3Uh
2022-09-08 12:45:35.710 INFO main sc_sysinfo: 💻 Operating system: linux
2022-09-08 12:45:35.710 INFO main sc_sysinfo: 💻 CPU architecture: x86_64
2022-09-08 12:45:35.710 INFO main sc_sysinfo: 💻 Target environment: gnu
2022-09-08 12:45:35.710 INFO main sc_sysinfo: 💻 CPU: Intel(R) Xeon(R) CPU @ 2.20GHz
2022-09-08 12:45:35.710 INFO main sc_sysinfo: 💻 CPU cores: 8
2022-09-08 12:45:35.710 INFO main sc_sysinfo: 💻 Memory: 16039MB
2022-09-08 12:45:35.710 INFO main sc_sysinfo: 💻 Kernel: 4.19.0-21-cloud-amd64
2022-09-08 12:45:35.710 INFO main sc_sysinfo: 💻 Linux distribution: Debian GNU/Linux 10 (buster)
2022-09-08 12:45:35.710 INFO main sc_sysinfo: 💻 Virtual machine: yes
2022-09-08 12:45:35.710 INFO main sc_service::builder: 📦 Highest known block at #0
2022-09-08 12:45:35.710 INFO tokio-runtime-worker substrate_prometheus_endpoint: 〽️ Prometheus exporter started at 127.0.0.1:9615
2022-09-08 12:45:35.713 INFO main sc_rpc_server: Running JSON-RPC HTTP server: addr=127.0.0.1:9933, allowed origins=Some(["http://localhost:*", "http://127.0.0.1:*", "https://localhost:*", "https://127.0.0.1:*", "https://polkadot.js.org"])
2022-09-08 12:45:35.714 INFO main sc_rpc_server: Running JSON-RPC WS server: addr=127.0.0.1:9944, allowed origins=Some(["http://localhost:*", "http://127.0.0.1:*", "https://localhost:*", "https://127.0.0.1:*", "https://polkadot.js.org"])
2022-09-08 12:45:35.714 INFO main sc_sysinfo: 🏁 CPU score: 609MB/s
2022-09-08 12:45:35.714 INFO main sc_sysinfo: 🏁 Memory score: 4530MB/s
2022-09-08 12:45:35.714 INFO main sc_sysinfo: 🏁 Disk score (seq. writes): 475MB/s
2022-09-08 12:45:35.714 INFO main sc_sysinfo: 🏁 Disk score (rand. writes): 204MB/s
2022-09-08 12:45:35.717 INFO tokio-runtime-worker libp2p_mdns::behaviour::iface: creating instance on iface 10.132.0.41
2022-09-08 12:45:38.867 INFO tokio-runtime-worker sub-libp2p: 🔍 Discovered new external address for our node: /ip4/34.78.133.145/tcp/30333/ws/p2p/12D3KooWL9R2Y3DuFdAueryD5nn6cL6BZs4RkMimC9cYkxyHE3Uh
2022-09-08 12:45:42.255 INFO tokio-runtime-worker substrate: ⏩ Warping, Downloading finality proofs, 15.97 Mib (19 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 2.7MiB/s ⬆ 26.9kiB/s
2022-09-08 12:45:49.053 INFO tokio-runtime-worker substrate: ⏩ Warping, Downloading finality proofs, 31.92 Mib (39 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 2.7MiB/s ⬆ 7.4kiB/s
2022-09-08 12:45:55.835 INFO tokio-runtime-worker substrate: ⏩ Warping, Downloading finality proofs, 47.89 Mib (45 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 2.7MiB/s ⬆ 16.7kiB/s
2022-09-08 12:45:55.836 WARN tokio-runtime-worker telemetry: ❌ Error while dialing /dns/telemetry.polkadot.io/tcp/443/x-parity-wss/%2Fsubmit%2F: Custom { kind: Other, error: Timeout }
2022-09-08 12:46:00.836 INFO tokio-runtime-worker substrate: ⏩ Warping, Downloading state, 99.24 Mib (50 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 6.5MiB/s ⬆ 23.8kiB/s
2022-09-08 12:46:05.836 INFO tokio-runtime-worker substrate: ⏩ Warping, Downloading state, 154.66 Mib (50 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 7.0MiB/s ⬆ 12.8kiB/s
2022-09-08 12:46:10.837 INFO tokio-runtime-worker substrate: ⏩ Warping, Downloading state, 207.67 Mib (50 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 6.9MiB/s ⬆ 7.3kiB/s
2022-09-08 12:46:15.837 INFO tokio-runtime-worker substrate: ⏩ Warping, Downloading state, 263.09 Mib (50 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 6.8MiB/s ⬆ 6.0kiB/s
2022-09-08 12:46:20.864 INFO tokio-runtime-worker substrate: ⏩ Warping, Downloading state, 304.35 Mib (50 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 4.4MiB/s ⬆ 5.0kiB/s
2022-09-08 12:46:25.865 INFO tokio-runtime-worker substrate: ⏩ Warping, Downloading state, 401.33 Mib (49 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 17.4MiB/s ⬆ 9.4kiB/s
2022-09-08 12:46:30.865 INFO tokio-runtime-worker substrate: ⏩ Warping, Downloading state, 500.22 Mib (38 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 17.7MiB/s ⬆ 10.9kiB/s
2022-09-08 12:46:35.866 INFO tokio-runtime-worker substrate: ⏩ Warping, Importing state, 577.70 Mib (37 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 13.8MiB/s ⬆ 5.5kiB/s
2022-09-08 12:46:40.866 INFO tokio-runtime-worker substrate: ⏩ Warping, Importing state, 577.70 Mib (29 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 2.6kiB/s ⬆ 2.5kiB/s
2022-09-08 12:46:45.866 INFO tokio-runtime-worker substrate: ⏩ Warping, Importing state, 577.70 Mib (28 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 23.9kiB/s ⬆ 15.6kiB/s
2022-09-08 12:46:50.866 INFO tokio-runtime-worker substrate: ⏩ Warping, Importing state, 577.70 Mib (31 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 3.0kiB/s ⬆ 2.9kiB/s
2022-09-08 12:46:55.867 INFO tokio-runtime-worker substrate: ⏩ Warping, Importing state, 577.70 Mib (28 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 16.8kiB/s ⬆ 10.3kiB/s
2022-09-08 12:47:00.867 INFO tokio-runtime-worker substrate: ⏩ Warping, Importing state, 577.70 Mib (36 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 3.4kiB/s ⬆ 3.8kiB/s
2022-09-08 12:47:04.229 WARN async-std/runtime trust_dns_proto::xfer::dns_exchange: failed to associate send_message response to the sender
2022-09-08 12:47:05.868 INFO tokio-runtime-worker substrate: ⏩ Warping, Importing state, 577.70 Mib (36 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 1.4kiB/s ⬆ 1.8kiB/s
2022-09-08 12:47:10.868 INFO tokio-runtime-worker substrate: ⏩ Warping, Importing state, 577.70 Mib (37 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 1.8kiB/s ⬆ 1.4kiB/s
2022-09-08 12:47:19.066 INFO tokio-runtime-worker substrate: ⏩ Warping, Importing state, 577.70 Mib (38 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 1.4kiB/s ⬆ 1.3kiB/s
2022-09-08 12:47:25.257 INFO tokio-runtime-worker substrate: ⏩ Warping, Importing state, 577.70 Mib (39 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 2.0kiB/s ⬆ 1.8kiB/s
2022-09-08 12:47:28.702 INFO tokio-runtime-worker db: Removing block #BlockId::Hash(0x91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3)
2022-09-08 12:47:30.412 INFO tokio-runtime-worker substrate: ⏩ Warping, Importing state, 577.70 Mib (40 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 2.8kiB/s ⬆ 2.7kiB/s
2022-09-08 12:47:35.574 INFO tokio-runtime-worker substrate: ⏩ Warping, Importing state, 577.70 Mib (29 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 5.2kiB/s ⬆ 5.9kiB/s
2022-09-08 12:47:40.762 INFO tokio-runtime-worker substrate: ⏩ Warping, Importing state, 577.70 Mib (32 peers), best: #0 (0x91b1…90c3), finalized #0 (0x91b1…90c3), ⬇ 3.8kiB/s ⬆ 3.8kiB/s
2022-09-08 12:47:48.754 INFO tokio-runtime-worker substrate: ⏩ Warping, Importing state, 577.70 Mib (28 peers), best: #11958735 (0x7bef…21c4), finalized #11958735 (0x7bef…21c4), ⬇ 9.0kiB/s ⬆ 5.6kiB/s
2022-09-08 12:47:50.786 INFO tokio-runtime-worker sync: Warp sync is complete (577 MiB), restarting block sync.
2022-09-08 12:47:52.460 INFO tokio-runtime-worker substrate: ✨ Imported #11958757 (0x130e…291d)
2022-09-08 12:47:53.758 INFO tokio-runtime-worker substrate: ⏩ Block history, #3648 (7 peers), best: #11958757 (0x130e…291d), finalized #11958735 (0x7bef…21c4), ⬇ 442.0kiB/s ⬆ 9.0kiB/s
2022-09-08 12:47:54.771 INFO tokio-runtime-worker substrate: ✨ Imported #11958758 (0xe724…8fbd)
2022-09-08 12:47:58.759 INFO tokio-runtime-worker substrate: ⏩ Block history, #14656 (21 peers), best: #11958758 (0xe724…8fbd), finalized #11958754 (0x3e53…0381), ⬇ 838.2kiB/s ⬆ 31.2kiB/s
2022-09-08 12:48:00.759 INFO tokio-runtime-worker substrate: ✨ Imported #11958759 (0x30e1…423e)
2022-09-08 12:48:00.822 INFO tokio-runtime-worker substrate: ✨ Imported #11958759 (0x75b0…42c0)
2022-09-08 12:48:03.760 INFO tokio-runtime-worker substrate: ⏩ Block history, #21504 (3 peers), best: #11958759 (0x30e1…423e), finalized #11958756 (0x4f92…0927), ⬇ 570.2kiB/s ⬆ 11.1kiB/s
2022-09-08 12:48:06.662 INFO tokio-runtime-worker substrate: ✨ Imported #11958760 (0x5779…1a5f)
2022-09-08 12:48:08.761 INFO tokio-runtime-worker substrate: ⏩ Block history, #30848 (23 peers), best: #11958760 (0x5779…1a5f), finalized #11958757 (0x130e…291d), ⬇ 1.0MiB/s ⬆ 87.0kiB/s
2022-09-08 12:48:13.310 INFO tokio-runtime-worker substrate: ✨ Imported #11958761 (0x4829…083b)
2022-09-08 12:48:13.434 INFO tokio-runtime-worker sc_informant: ♻️ Reorg on #11958761,0x4829…083b to #11958761,0xcf72…6052, common ancestor #11958760,0x5779…1a5f
2022-09-08 12:48:13.435 INFO tokio-runtime-worker substrate: ✨ Imported #11958761 (0xcf72…6052)
2022-09-08 12:48:13.763 INFO tokio-runtime-worker substrate: ⏩ Block history, #39424 (2 peers), best: #11958761 (0xcf72…6052), finalized #11958758 (0xe724…8fbd), ⬇ 784.1kiB/s ⬆ 60.7kiB/s
Notice how the number of peers jumps from 20+ to 2. Unfortunately our network protocol does not include reason for why other peers decide to disconnect. However it looks like this is happening after we announce latest blocks to other peers. It could be that something was added to the block announcement that broke network protocol compatibility because of this code here: https://github.com/paritytech/substrate/blob/ded44948e2d5a398abcb4e342b0513cb690961bb/primitives/consensus/common/src/block_validation.rs#L89
@dmitry-markin and @altonen could both of you please both look into this?
To share intermediate results: I was able to reproduce the issue on polkadot+substrate master, but also on polkadot-0.9.28 & polkadot-0.9.16. So, it looks like this is not something introduced by the recent changes. Also, a couple of times everything worked fine (on master), so I'm starting to be suspicious that the issue might be caused by some external factors.
So, it looks like this is not something introduced by the recent changes. Also, a couple of times everything worked fine (on master), so I'm starting to be suspicious that the issue might be caused by some external factors.
I saw this quite a while ago on my PC as well, but thought it was normal. Like dropping peers from 30 to 10 or even less, and then slowly creeping back up.
I didn't look into warping, but if it downloads significant chunks of data, it might have the same root cause as https://github.com/paritytech/substrate/issues/12105, namely keep alive timeout.
I didn't look into warping, but if it downloads significant chunks of data, it might have the same root cause as #12105, namely keep alive timeout.
I tried reducing keep alive timeout from 10 sec to 1 sec (locally) to see if it affects the peer count, and it looks like it doesn't. Overall, the peer count jitter seems purely random: sometimes the peer count is stable on 20-30 peers, and sometimes it starts dropping to almost 0.