pppd to terminate on "LCP: timeout sending Config-Requests"
Hi,
We've picked up a few oddball cases where we expect pppd to terminate. It seems the common factor is the last line stating "LCP: timeout sending Config-Requests", eg:
Apr 6 18:56:56 hostname pppd[6145]: Plugin radius.so loaded.
Apr 6 18:56:56 hostname pppd[6145]: RADIUS plugin initialized.
Apr 6 18:56:56 hostname pppd[6145]: Plugin radattr.so loaded.
Apr 6 18:56:56 hostname pppd[6145]: RADATTR plugin initialized.
Apr 6 18:56:56 hostname pppd[6145]: Plugin pppol2tp.so loaded.
Apr 6 18:56:56 hostname pppd[6145]: pppd 2.5.1 started by root, uid 0
Apr 6 18:56:56 hostname pppd[6145]: Using interface ppp1
Apr 6 18:56:56 hostname pppd[6145]: Connect: ppp1 <-->
Apr 6 18:56:56 hostname pppd[6145]: Overriding mtu 1500 to 1363
Apr 6 18:56:56 hostname pppd[6145]: Overriding mru 1500 to mtu value 1363
Apr 6 18:57:26 hostname pppd[6145]: LCP: timeout sending Config-Requests
Apr 7 23:24:15 hostname pppd[6145]: Terminating on signal 15
Apr 7 23:24:15 hostname pppd[6145]: Connection terminated.
Apr 7 23:24:15 hostname pppd[6145]: Exit.
You will note more than 24h later I killed this using "kill 6145" at which point it terminated.
I find it a bit oddball that xl2tpd also doesn't kill the pppd since I'm assuming the remote l2tp service has become non-responsive ... but the simplest "fix" is to just have pppd terminate here I reckon, please confirm my assessment, will look into code.
@paulusmack: What do you think?
What options are in use here? If you can reproduce with the 'debug' option enabled so we can see the actual negotiation (if any), that might also help. Also, is this new behavior, or do older versions of pppd also do this?
Hi,
I still need to find a reliable way to reproduce. Only seeing this on two specific hosts, where I suspect the "remote" side vanishes during setup (they're incoming from mobile source, which in my experience means they frequently disconnect and connect to the internet). My suspicion is that if LCP times out from establishing during the early phases, or during auth, then this can happen.
I've got no ideal if this is old or new behaviour, I've not seen (observed) it before, but that unfortunately doesn't eliminate this as a potential old existing issue.
@nomis, @rfc1036, @zdzichu, @pali, @carlsonj, @masputra, @enaess, @chipitsine, @sthibaul, @alarig, @EasyNetDev, @anphsw, @dfskoll, @XWwalker, @kurt-vd, @nkukard, @kad, @tisj: What do you think?
I do still see some cases where pppd instances don't close when they're supposed to. Usually this is a non-issue, but I have also seen a few on pppoe now. So something somewhere is up. I suspect (based on events the last few days) it might be related to radius blocking, but I've not managed to confirm yet. I don't think this is critical for a 2.5.3.
Failing to exit after an LCP config-request timeout itself sounds like a problem, and my first guess would be that there's something wrong with the underlying serial transport -- it's hanging on close() or pppd is unexpectedly configured for demand or persist mode or something like that. A stack backtrace in this state might help. And I definitely agree that a debug log and set of options/configuration is going to help.
The latter cases from 4 days ago should be fixed by #572 - but that won't affect the original issue mentioned here. I think the best approach is to attach with strace the next time it happens, I won't be surprised if it's something similar to #572 ...
sorry for naive question. @jkroonza do you maybe have core dump collection configured on that host ? it would be quite interesting to see "gdb bt"
is that "signal 15" arriving from xl2tpd or (I assume) from some segmentation fault ?
I do have some experience running xl2tpd. never seen anything like that.
sorry for naive question. @jkroonza do you maybe have core dump collection configured on that host ? it would be quite interesting to see "gdb bt"
We'd have to manually send SEGV since it doesn't actually crash, it merely stalls, I suspect (and I didn't have the savvy to use strace to at least get a system call or something that it blocks on) to maybe use /proc/${relevant_pid}/fd/ to try and figure out which socket is the problematic one.
is that "signal 15" arriving from xl2tpd or (I assume) from some segmentation fault ?
SIGTERM - was me doing kill ${relevant_pid} to terminate the relevant ppp that wasn't making progress.
I do have some experience running xl2tpd. never seen anything like that.
Keep in mind I'm running with some extra patches: https://github.com/xelerance/xl2tpd/pulls/jkroonza
Specifically https://github.com/xelerance/xl2tpd/pull/277 could cause other issues, and I have identified some additional new segfaults in xl2tpd, or more to the point - I suspect it triggered some existing use-after-free use-cases that was already there, but due to ordering of the select() code never showed effect, but I've not had time to sit down and dig into it properly yet with gdb.
use after free leads to segmentation fault and immediate process termination. likely it is not the case
I was more successful with adding debug "printf" than dancing with gdb