explorer script hangs when script runs while script is running.

updated for clarity. very occasionally with 5 minute 'index update' using cronjobs, the running process seems to hang in such a way that later update processes fail to fail, seeming to result in many index/update processes hanging

i recently encountered a v1.7.4 explorer displaying a 31 hour old block, and >10 node processes running. there was evidence of an unclean shutdown of the wallet at the displayed block, this implies the sudden shutdown somehow effected the explorers updating process. given the 31 hour break, over 350 instances of update should have been run, i did not consider this and have no idea of what particular (or even exactly how many) instances were running at the time.

this method seems to result in the same behaviour, or, how to replicate, using two remote sessions, i ran rm tmp/index.pid; node scripts/sync.js index update then quickly on other session also run it.

output from first instance -

# rm tmp/index.pid; node scripts/sync.js index update
rm: cannot remove 'tmp/index.pid': No such file or directory
script launched with pid: 654526

output from second instance -

# rm tmp/index.pid; node scripts/sync.js index update
rm: cannot remove 'tmp/index.pid': No such file or directory
Script already running..

at this point, the first process was borken, resulting in further update processes also being borken. output from third instance -

# rm tmp/index.pid; node scripts/sync.js index update
script launched with pid: 654604

ctrl+c closes each process as expected, including allowing further updates to run normally if the initial process is closed. however running processes in background causes multiple update processes to run simultaneously

Nov 20 '20 07:11 bumbacoin

Why are we running further instances in consecutively?

The output from the 2nd instance can be summed up to the fact that the time between the removal of the index.pid and the creation of the lock allowed time for instance 3 or instance 1 in that case to create a lock (which is why "script already running" occurs). (This as well can occur because the cronjob fired in this void period.)

No aside from that question, what is the definition here of "continue to hang"? How long are these allowed to sit before you're exiting them, are these in separate sessions/screens or are they being Ctrl-C'd after X time? How far down the process are they making it before "hanging"? (ie: I see they make it to line 131, but are they making it to "db.update_db", "db.update_heavy" if heavy, is it making it to "db.update_tx_db"?)

If it's making it to update_tx_db, then understand that lines 729-731 (of database.js) is a taxing job. it's starting from the starting height and going until the end, just adding the block heights to an array "blocks_to_scan". So i'd fully expect if there are thousands of blocks, that this will take time before we finally get to 739 and start getting blockhashes.

Unlike 1.6.x, we no longer use the "Express" version of the nodeapi, so we no longer get output in the main "npm start" instance. Alot of it is just all behind the scenes now.

Nov 20 '20 14:11 uaktags

this was an effort to replicate a situation i have very occasionally come across in the wild. today i discovered an explorer (v1.7.4) with the display stuck on a block 31 hours previous, (the wallet debug did show a sudden stop at the stuck height, then restart). a cursory glance at running processes using top, showed >10 node processes running. over 350 'index updates' would've been triggered during the 31 hours, but i have no idea of the particular pids running before i stopped them.

my method above was done using two remote sessions. update # 1 and # 2 were done very closely together, this was enough to trigger the issue, so # 3 could be run at any time afterwards (as long as # 1 was still running)

ctrl+c closed each process as expected, but if run as background then they continue to accumulate. eg

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                             
    717 root      20   0  962820 114256      0 S  18.3   5.4  33:05.70 /usr/bin/node /root/explorer/bin/cluster                                                            
    716 root      20   0  963264 120164      0 S  15.9   5.7  33:54.21 /usr/bin/node /root/explorer/bin/cluster                                                            
    718 root      20   0  962720 133512      0 R  15.9   6.4  33:38.91 /usr/bin/node /root/explorer/bin/cluster                                                            
 659758 root      20   0  946936  63664   1352 S  14.6   3.0   0:12.92 node scripts/sync.js index update                                                                   
 659790 root      20   0  949900  56452      0 S  14.0   2.7   0:17.08 node scripts/sync.js index update                                                                   
 659774 root      20   0  945816  64544   3056 R  13.0   3.1   0:15.31 node scripts/sync.js index update                                                                   
 659655 root      20   0  948252  81876   1316 S  10.6   3.9   0:44.93 node scripts/sync.js index update                                                                   
    704 root      20   0  609652   9572   1252 S   4.0   0.5   5:44.05 /usr/bin/node /root/explorer/bin/cluster                                                            
    436 root      20   0  597880  12556      0 S   2.0   0.6   2:15.46 /usr/bin/node /usr/local/lib/node_modules/forever/bin/monitor bin/cluster r

i did happen to leave the extra console output from your suggestion in #400 and each process triggers this output, but this is as far as my checking has gone. in sync.js line 189 console.log("Coin: %s. Last stats: %s. Current BlockCount: %s",settings.coin, stats.last, stats.count);

i created this issue to see if it was known before i spent a few hours more on it. i had initially assumed the explorer was a candidate for #400 so i'd spent some time testing it in regards to that, so my apologies for not explaining this more fully. i'll do some more testing next day or two.

Nov 20 '20 15:11 bumbacoin

Unfortunately i suspect that the best way to nail down what is causing the issue for you is to traverse down and place the output comments to find out where is it going and how much time its spending on that topic. I'm only aware of the how slow the process can be to get up and going in its current form (and that there's no much for verbose output anymore). So I think we may need an instance of digging deeper with console.logs to track down what can be improved and what cant

Nov 20 '20 18:11 uaktags