chainweb-node
chainweb-node copied to clipboard
chainweb-node seems to open the same log file twice for writting
which causes scripts/run-nodes.sh to fail with errors like this:
bash ./scripts/run-nodes.sh ./result/ghc/chainweb/bin/chainweb-node 10 /tmp/run-nodes-logs
starting 10 chainweb nodes
started bootstrap node 0
started node 1
chainweb-node: /tmp/run-nodes-logs/telemetry.node0.log: openFile: resource busy (file is locked)
started node 2
chainweb-node: /tmp/run-nodes-logs/telemetry.node1.log: openFile: resource busy (file is locked)
started node 3
chainweb-node: /tmp/run-nodes-logs/telemetry.node2.log: openFile: resource busy (file is locked)
started node 4
chainweb-node: /tmp/run-nodes-logs/telemetry.node3.log: openFile: resource busy (file is locked)
chainweb-node: /tmp/run-nodes-logs/telemetry.node4.log: openFile: resource busy (file is locked)
started node 5
started node 6
chainweb-node: /tmp/run-nodes-logs/telemetry.nod
When run with strace, by means of a little wrapper script (mynode.sh):
#!/usr/bin/env bash
strace ./result/ghc/chainweb/bin/chainweb-node "$@"
Run this way:
bash ./scripts/run-nodes.sh ./mynode.sh 10 /tmp/run-nodes-logs 2> /tmp/trace-log
I get the following strace (only the interesting part):
openat(AT_FDCWD, "/tmp/run-nodes-logs/telemetry.node0.log", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 56
fstat(56, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
ftruncate(56, 0) = 0
ioctl(56, TCGETS, 0x7ffe2fcc2f00) = -1 ENOTTY (Inappropriate ioctl for device)
ioctl(56, TCGETS, 0x7ffe2fcc2f00) = -1 ENOTTY (Inappropriate ioctl for device)
futex(0x7ffbc0000ba8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbc0000bb0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x39e3ae8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffb84000ba8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffb84000bb0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x39f4088, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbb0000ba8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbb0000bb0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x3a14bc8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbbc000ba8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbbc000bb0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x3a25168, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbc8000ba8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbc8000bb0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x3a35708, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbc4000bac, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbc4000bb0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x3a45ca8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbdc000bac, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbdc000bb0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x3a56248, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbd4000bac, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbd4000bb0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x3a667e8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbd8000bac, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbd8000bb0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x3a76d88, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffba8000ba8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffba8000bb0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x3a87328, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=0, tv_nsec=380220146}) = 0
getrusage(RUSAGE_SELF, {ru_utime={tv_sec=0, tv_usec=344055}, ru_stime={tv_sec=0, tv_usec=36322}, ...}) = 0
mmap(0x4200e00000, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4200e00000
madvise(0x4200e00000, 1048576, MADV_WILLNEED) = 0
madvise(0x4200e00000, 1048576, MADV_DODUMP) = 0
sched_yield() = 0
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=0, tv_nsec=391705730}) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
openat(AT_FDCWD, "/tmp/run-nodes-logs/telemetry.node0.log", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 57
Notice the openat in the last line and the one in the first line - same file, but no close in between.
I was not able to find the reason for this yet - still investigating.
I'm also able to reproduce this:
[vaibhavsagar@nixos:~/repos/chainweb-node]$ bash ./scripts/run-nodes.sh result/ghc/chainweb/bin/chainweb-node 10 /tmp/run-nodes-logs
starting 10 chainweb nodes
started bootstrap node 0
disabling tx index
chainweb-node: /tmp/run-nodes-logs/telemetry.node0.log: openFile: resource busy (file is locked)
started node 1
disabling tx index
chainweb-node: /tmp/run-nodes-logs/telemetry.node1.log: openFile: resource busy (file is locked)
started node 2
disabling tx index
chainweb-node: /tmp/run-nodes-logs/telemetry.node2.log: openFile: resource busy (file is locked)
started node 3
disabling tx index
chainweb-node: /tmp/run-nodes-logs/telemetry.node3.log: openFile: resource busy (file is locked)
started node 4
disabling tx index
chainweb-node: /tmp/run-nodes-logs/telemetry.node4.log: openFile: resource busy (file is locked)
started node 5
disabling tx index
chainweb-node: /tmp/run-nodes-logs/telemetry.node5.log: openFile: resource busy (file is locked)
started node 6
disabling tx index
chainweb-node: /tmp/run-nodes-logs/telemetry.node6.log: openFile: resource busy (file is locked)
started node 7
disabling tx index
chainweb-node: /tmp/run-nodes-logs/telemetry.node7.log: openFile: resource busy (file is locked)
started node 8
disabling tx index
chainweb-node: /tmp/run-nodes-logs/telemetry.node8.log: openFile: resource busy (file is locked)
started node 9
disabling tx index
chainweb-node: /tmp/run-nodes-logs/telemetry.node9.log: openFile: resource busy (file is locked)
kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
Issue can be worked around, by simply not passing in a log directory (uses stdout then) - which seems to work fine.
Bug -- @larskuhtz please confirm or close