script hangs when script runs while script is running.
updated for clarity. very occasionally with 5 minute 'index update' using cronjobs, the running process seems to hang in such a way that later update processes fail to fail, seeming to result in many index/update processes hanging
i recently encountered a v1.7.4 explorer displaying a 31 hour old block, and >10 node processes running. there was evidence of an unclean shutdown of the wallet at the displayed block, this implies the sudden shutdown somehow effected the explorers updating process. given the 31 hour break, over 350 instances of update should have been run, i did not consider this and have no idea of what particular (or even exactly how many) instances were running at the time.
this method seems to result in the same behaviour, or, how to replicate,
using two remote sessions, i ran
rm tmp/index.pid; node scripts/sync.js index update
then quickly on other session also run it.
output from first instance -
# rm tmp/index.pid; node scripts/sync.js index update
rm: cannot remove 'tmp/index.pid': No such file or directory
script launched with pid: 654526
output from second instance -
# rm tmp/index.pid; node scripts/sync.js index update
rm: cannot remove 'tmp/index.pid': No such file or directory
Script already running..
at this point, the first process was borken, resulting in further update processes also being borken. output from third instance -
# rm tmp/index.pid; node scripts/sync.js index update
script launched with pid: 654604
ctrl+c closes each process as expected, including allowing further updates to run normally if the initial process is closed. however running processes in background causes multiple update processes to run simultaneously
Why are we running further instances in consecutively?
The output from the 2nd instance can be summed up to the fact that the time between the removal of the index.pid and the creation of the lock allowed time for instance 3 or instance 1 in that case to create a lock (which is why "script already running" occurs). (This as well can occur because the cronjob fired in this void period.)
No aside from that question, what is the definition here of "continue to hang"? How long are these allowed to sit before you're exiting them, are these in separate sessions/screens or are they being Ctrl-C'd after X time? How far down the process are they making it before "hanging"? (ie: I see they make it to line 131, but are they making it to "db.update_db", "db.update_heavy" if heavy, is it making it to "db.update_tx_db"?)
If it's making it to update_tx_db, then understand that lines 729-731 (of database.js) is a taxing job. it's starting from the starting height and going until the end, just adding the block heights to an array "blocks_to_scan". So i'd fully expect if there are thousands of blocks, that this will take time before we finally get to 739 and start getting blockhashes.
Unlike 1.6.x, we no longer use the "Express" version of the nodeapi, so we no longer get output in the main "npm start" instance. Alot of it is just all behind the scenes now.
this was an effort to replicate a situation i have very occasionally come across in the wild. today i discovered an explorer (v1.7.4) with the display stuck on a block 31 hours previous, (the wallet debug did show a sudden stop at the stuck height, then restart). a cursory glance at running processes using top, showed >10 node processes running. over 350 'index updates' would've been triggered during the 31 hours, but i have no idea of the particular pids running before i stopped them.
my method above was done using two remote sessions. update # 1 and # 2 were done very closely together, this was enough to trigger the issue, so # 3 could be run at any time afterwards (as long as # 1 was still running)
ctrl+c closed each process as expected, but if run as background then they continue to accumulate. eg
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
717 root 20 0 962820 114256 0 S 18.3 5.4 33:05.70 /usr/bin/node /root/explorer/bin/cluster
716 root 20 0 963264 120164 0 S 15.9 5.7 33:54.21 /usr/bin/node /root/explorer/bin/cluster
718 root 20 0 962720 133512 0 R 15.9 6.4 33:38.91 /usr/bin/node /root/explorer/bin/cluster
659758 root 20 0 946936 63664 1352 S 14.6 3.0 0:12.92 node scripts/sync.js index update
659790 root 20 0 949900 56452 0 S 14.0 2.7 0:17.08 node scripts/sync.js index update
659774 root 20 0 945816 64544 3056 R 13.0 3.1 0:15.31 node scripts/sync.js index update
659655 root 20 0 948252 81876 1316 S 10.6 3.9 0:44.93 node scripts/sync.js index update
704 root 20 0 609652 9572 1252 S 4.0 0.5 5:44.05 /usr/bin/node /root/explorer/bin/cluster
436 root 20 0 597880 12556 0 S 2.0 0.6 2:15.46 /usr/bin/node /usr/local/lib/node_modules/forever/bin/monitor bin/cluster r
i did happen to leave the extra console output from your suggestion in #400 and each process triggers this output, but this is as far as my checking has gone.
in sync.js line 189
console.log("Coin: %s. Last stats: %s. Current BlockCount: %s",settings.coin, stats.last, stats.count);
i created this issue to see if it was known before i spent a few hours more on it. i had initially assumed the explorer was a candidate for #400 so i'd spent some time testing it in regards to that, so my apologies for not explaining this more fully. i'll do some more testing next day or two.
Unfortunately i suspect that the best way to nail down what is causing the issue for you is to traverse down and place the output comments to find out where is it going and how much time its spending on that topic. I'm only aware of the how slow the process can be to get up and going in its current form (and that there's no much for verbose output anymore). So I think we may need an instance of digging deeper with console.logs to track down what can be improved and what cant