dkg-engine icon indicating copy to clipboard operation
dkg-engine copied to clipboard

Node crashes for unknown reason when operation "Paranet sync: Syncing asset" starts for multipe assets.

Open Larsk97 opened this issue 1 year ago • 1 comments

Issue description

Currently I have multiple nodes trying to sync assets from 4 open paranets:

  • did:dkg:base:84532/0xb8b904c73d2fb4d8c173298a51c27fab70222c32/5670593
  • did:dkg:base:84532/0xb8b904c73d2fb4d8c173298a51c27fab70222c32/5670674
  • did:dkg:base:84532/0xb8b904c73d2fb4d8c173298a51c27fab70222c32/5670755
  • did:dkg:base:84532/0xb8b904c73d2fb4d8c173298a51c27fab70222c32/5670819

And 4 curated paranets which currently has 0 KAs attached to it.

The issue is that my nodes will try to getKnowledgeAssetLocator for all assets in the paranet, also using withPagination: {9CEE6D0F-365F-47A9-B92F-E87042B0E87F} When it seems like it is done doing this for the whole paranet, the syncing operations will begin. Which in my case is thousands of operations.: {4D4BB27A-7F2C-46F3-A968-7C343B417FFC} Then the otnode.service crashes suddently with status=9/KILL {BC762F1B-20AC-4CF5-BB59-89482C960100} Or on the other node I got status=1/Failure. with warning of packets out of order. {F9875096-689E-442E-A980-6688212FCA94}

Expected behavior

Expected behaviour is the node being able to handle the operations for paranet sync, then later on get confirmation on it being done successfully. Then continuing working as normal keeping track of paranets that it should sync.

Actual behavior

Actual behaviour is that the node crashes after getting all assetlocators for paranets and then trying to sync them all. Which results in the nodes restarting, and also the whole process of getting assetLocators it will try to sync. Only to then crash again. With the provided paranet uals, trying to sync all 4 of them. The node will crash every 3-4 hours. See the log I have provided.

Steps to reproduce the problem

  1. Add the 4 paranet uals I have provided for asset syncing.

Specifications

  • Node version: 16.20.1
  • Platform: Ubuntu 24.04
  • Node wallet: All my nodes have the same problem doing this. Giving only one of them: 0x157a446C596f4845e68A256EB71302Fdf82BC4D8
  • Node libp2p identity: 284
  • server spec: 4 vCpus, 8gb ram, 80gb ssd.

Contact details

  • Origintrail Discord: Lars, ksral, discord originally known as KsraL#0123

Error logs

You can see that it has crashed 4 times in that timespan. {E4EF9714-3E58-4181-9E48-E14EBFF7E31C} The provided error logs I give is from the following timespan on my node: {6DD9E163-EC4E-4E5D-A94E-E910BF8C07DB}

The crash happens between log message at 05:12:51 - 05:14:08 Log is added as file. logs_output.txt

Disclaimer

Please be aware that the issue reported on a public repository allows everyone to see your node logs, node details, and contact details. If you have any sensitive information, feel free to share it by sending an email to [email protected].

Larsk97 avatar Oct 23 '24 10:10 Larsk97

also this is related to #3345

Larsk97 avatar Oct 23 '24 10:10 Larsk97

This is still an issue on all nodes after upgrading to node 20.18.0, and using a payed rpc.

Larsk97 avatar Oct 24 '24 08:10 Larsk97

Syncing should be optimized with #3347, let's see how it goes after the new release

br41nlet avatar Oct 24 '24 14:10 br41nlet