keep-core icon indicating copy to clipboard operation
keep-core copied to clipboard

Network performance improvements: "dropping message to peer" nad "can't deliver message to subscription for topic"

Open pdyraga opened this issue 2 years ago • 2 comments

The following warnings were logged during key generation and heartbeat signing on a machine with good performance. We need to investigate what is the reason and see if we shouldn't tweak some parameters.

[email protected]/floodsub.go:95	dropping message to peer 16Uiu2HAkwbed4RCLysj3TRKeR8UabPThKqWWvd1sW2HtAWgm7nYx: queue full
[email protected]/pubsub.go:950	Can't deliver message to subscription for topic tbtc-04b0a483e97dfbb15e88ecbc2ef8f7e37776dd713eed54089f87704a4fbae0442aa3ff5a0ec7f4d52bf4680b5b9b70be9accb64970df11b96abb61d48be7d8db8b; subscriber too slow

Log provided by one of the beta stakers: 65a4eb.log

pdyraga avatar Jan 30 '23 21:01 pdyraga

Lost of all the peers connections happens 3 times a day (every heartbeat sync) for one of our nodes. Analyzing graphs I see it consumes much more memory and CPU during the sync process. Before peers are disconnected I see a lot of queue full as well as Can't deliver message to subscription for topic messages, then all the peers and bootstrap nodes are disconnected. Interesting fact that only one of 6 nodes fails like this, all the rest work normally. And this one failing fails all the time. I tried to rebuild OS and all the software from scratch, add more memory and CPU, but it fails all in vain. From my perspective for some reason some nodes handle much more requests which fills queue rapidly. So we either need to find why it happens or try to increase queue length. Attaching logs of such an event.

full_05_1.log 2f5_20.40-50.log

antonr-p2p avatar Feb 22 '23 09:02 antonr-p2p

BTW you can find more details here: https://discord.com/channels/866378471868727316/965614739562565673/1075320930559086612

antonr-p2p avatar Feb 22 '23 09:02 antonr-p2p