looping/high CPU with serial-tnc
I've been trying out pat v0.12.1 compiled from source in a debian11 vagrant box (with VirtualBox provider under MacOS Monterrey) using a serial-tnc and it appears to be in some sort of infinite loop (CPU goes to 100%). I don't really know enough to debug golang but here's what I've got (with PAT_DEBUG=1 in the env). Any pointers toward further debugging would be appreciated.
vagrant@debian11:~/src/pat$ dlv debug -- connect serial-tnc
2022-02-18T21:44:10Z error layer=debugger can't find build-id note on binary
Type 'help' for list of commands.
(dlv) c
2022/02/18 21:44:14 [DEBUG] Version: v0.12.1 (unknown origin) linux/amd64 - go1.17
2022/02/18 21:44:14 [DEBUG] Command: connect [serial-tnc]
2022/02/18 21:44:14 [DEBUG] Mailbox dir is '/home/vagrant/.local/share/pat/mailbox'
2022/02/18 21:44:14 [DEBUG] Forms dir is '/home/vagrant/.local/share/pat/Standard_Forms'
2022/02/18 21:44:14 [DEBUG] Config file is '/home/vagrant/.config/pat/config.json'
2022/02/18 21:44:14 [DEBUG] Log file is '/home/vagrant/.local/state/pat/pat.log'
2022/02/18 21:44:14 [DEBUG] Event log file is '/home/vagrant/.local/state/pat/eventlog.json'
2022/02/18 21:44:14 [DEBUG] tried to migrate from /home/vagrant/.wl2k but it doesn't exist; nothing to do
2022/02/18 21:44:14 Connecting to WB2ZII-10 (serial-tnc)...
received SIGINT, stopping process (will not forward signal)
2022-02-18T21:45:10Z error layer=debugger error loading binary "/lib/x86_64-linux-gnu/libc.so.6": could not parse .eh_frame section: pointer encoding not supported 0x9b at 0x5784
> runtime.epollwait() /usr/local/go/src/runtime/sys_linux_amd64.s:666 (PC: 0x46c2a0)
Warning: debugging optimized function
661: MOVL nev+16(FP), DX
662: MOVL timeout+20(FP), R10
663: MOVQ $0, R8
664: MOVL $SYS_epoll_pwait, AX
665: SYSCALL
=> 666: MOVL AX, ret+24(FP)
667: RET
668:
669: // void runtime·closeonexec(int32 fd);
670: TEXT runtime·closeonexec(SB),NOSPLIT,$0
671: MOVL fd+0(FP), DI // fd
(dlv) frame
Command failed: not enough arguments
(dlv) stack
0 0x000000000046c2a0 in runtime.epollwait
at /usr/local/go/src/runtime/sys_linux_amd64.s:666
1 0x0000000000434d7c in runtime.netpoll
at /usr/local/go/src/runtime/netpoll_epoll.go:127
2 0x0000000000440345 in runtime.findrunnable
at /usr/local/go/src/runtime/proc.go:2947
3 0x00000000004414f4 in runtime.schedule
at /usr/local/go/src/runtime/proc.go:3367
4 0x0000000000441a2d in runtime.park_m
at /usr/local/go/src/runtime/proc.go:3516
5 0x0000000000468063 in runtime.mcall
at /usr/local/go/src/runtime/asm_amd64.s:307
(dlv) exit
vagrant@debian11:~/src/pat$ tail ~/.local/state/pat/pat.log
2022/02/18 21:44:14 Connecting to WB2ZII-10 (serial-tnc)...
The above is probably not super-useful as that looks like an event waiter loop.
I've confirmed that the serial TNC connection to my Kenwood TM-D710 works with minicom. I have intermittently successfully connected to the packet node with pat.
More info:
vagrant@debian11:~/src/pat$ uname -a
Linux debian11.localdomain 5.10.0-11-amd64 #1 SMP Debian 5.10.92-1 (2022-01-18) x86_64 GNU/Linux
vagrant@debian11:~/src/pat$ go version
go version go1.17 linux/amd64
vagrant@debian11:~/src/pat$ lsusb
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 002: ID 050d:0109 Belkin Components F5U109/F5U409 PDA Adapter
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Actually, I'm not sure I've every successfully connect or if the TNC CONNECT command has been issued is all. Is this stuff doing TNC commands or KISS?
Looking at the code, my best guess is that is this call returns an error (which we ignore) or some unexpected data stream that never ends. The for loop would end up in an endless loop. We need to fix this by properly checking the returned error, and somehow detect unexpected data returned.
The serial-tnc transport is a neglected part of Pat. It's badly written and only ever tested properly with TH-D72/TM-D710. For now, you are probably better of running your TNC in KISS mode and use the ax25:// transport instead. More on this topic here.
It should work work with your TM-D710, but unfortunately the code is in such a poor state that it is hard to debug the cause. Might be incorrect serial baudrate setting, or some issue with your USB serial adapter.. but again, most users use the kernel's ax25 stack with ax25:// instead of serial-tnc since this is much more stable and avoids any flow control issues often associated with running TNCs in transparent connected mode.
@martinhpedersen I switched to ax.25 kiss and it works great. It's always been my preferred way to connect a TNC. Up to you if you want to keep this issue open w.r.t not checking error status in kenwood.go.
@n2ygk Glad it worked out for you 👍
I think we should keep this issue open and fix that error handling regardless. I'm also considering adding some timeouts, which would make it a bit more robust 🙂