ser2net icon indicating copy to clipboard operation
ser2net copied to clipboard

segmentation fault

Open ffolkes1911 opened this issue 10 months ago • 7 comments

ser2net segfaults roughly within a day on a certain PC. Here's some info from latest core dump:

Click to expand
TESTPC@TESTPC:~$ coredumpctl info
           PID: 3328456 (ser2net)
           UID: 1000 (TESTPC)
           GID: 1000 (TESTPC)
        Signal: 11 (SEGV)
     Timestamp: Tue 2025-06-03 15:34:05 EEST (6min ago)
  Command Line: ser2net -c /etc/ser2net/ser2net.yaml -n -d -l
    Executable: /usr/local/sbin/ser2net
 Control Group: /user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/vte-spawn-f946c5d8-87f6-4db4-a1fe-815905422146.scope
          Unit: [email protected]
     User Unit: vte-spawn-f946c5d8-87f6-4db4-a1fe-815905422146.scope
         Slice: user-1000.slice
     Owner UID: 1000 (TESTPC)
       Boot ID: 0cb88d3810584cf2bc32784d8aaa5a37
    Machine ID: ede2018e83ec4e2191c36c8c3208d2c2
      Hostname: TESTPC
       Storage: /var/lib/systemd/coredump/core.ser2net.1000.0cb88d3810584cf2bc32784d8aaa5a37.3328456.1748954045000000.zst (present)
  Size on Disk: 90.4K
       Message: Process 3328456 (ser2net) of user 1000 dumped core.
                
                Stack trace of thread 3328456:
                #0  0x00007edb9178b75d __strlen_avx2 (libc.so.6 + 0x18b75d)
                #1  0x0000630f94c59567 ser_control_set (ser2net + 0x8567)
                #2  0x00007edb91929883 serconf_call_cdone (libgensio_serialdev.so + 0x3883)
                #3  0x00007edb9196a4db process_runners (libgensioosh.so.10 + 0x164db)
                #4  0x00007edb9196c96f sel_select_intr_sigmask (libgensioosh.so.10 + 0x1896f)
                #5  0x00007edb91969636 i_wait_for_waiter_timeout (libgensioosh.so.10 + 0x15636)
                #6  0x00007edb919698b2 gensio_unix_wait (libgensioosh.so.10 + 0x158b2)
                #7  0x0000630f94c567a7 main (ser2net + 0x57a7)
                #8  0x00007edb9162a1ca __libc_start_call_main (libc.so.6 + 0x2a1ca)
                #9  0x00007edb9162a28b __libc_start_main_impl (libc.so.6 + 0x2a28b)
                #10 0x0000630f94c57005 _start (ser2net + 0x6005)
                ELF object binary architecture: AMD x86-64

I can send over the dump itself if you need.

Error frequency:

Click to expand
start at Thu May 29 09:35:50 AM EEST 2025
Segmentation fault (core dumped)
start at Thu May 29 06:18:53 PM EEST 2025
Segmentation fault (core dumped)
start at Thu May 29 08:01:57 PM EEST 2025
Segmentation fault (core dumped)
start at Fri May 30 12:48:23 AM EEST 2025
Segmentation fault (core dumped)
start at Sat May 31 05:54:39 AM EEST 2025
Segmentation fault (core dumped)
start at Sat May 31 07:05:03 AM EEST 2025
Segmentation fault (core dumped)
start at Sat May 31 08:45:39 AM EEST 2025
Segmentation fault (core dumped)
start at Sat May 31 11:55:03 PM EEST 2025
Segmentation fault (core dumped)
start at Mon Jun  2 12:30:02 AM EEST 2025
Segmentation fault (core dumped)
start at Tue Jun  3 08:04:19 AM EEST 2025
Segmentation fault (core dumped)
start at Tue Jun  3 03:34:05 PM EEST 2025

ser2net.yaml (nothing really different from other computers):

Click to expand
%YAML 1.1
---
# This is a ser2net configuration file, tailored to be rather
# simple.
#
# Find detailed documentation in ser2net.yaml(5)
# A fully featured configuration file is in
# /usr/share/doc/ser2net/examples/ser2net.yaml.gz
# 
# If you find your configuration more useful than this very simple
# one, please submit it as a bugreport


admin: &admin1
  accepter: tcp,2000


#########################




# initial serial: A50285BI, full path: /dev/serial/by-path/pci-0000:00:14.0-usbv2-0:2.3:1.0-port0
connection: &con001
  accepter: telnet(rfc2217),tcp,38001
  connector: serialdev,/dev/serial/by-path/pci-0000:00:14.0-usbv2-0:2.3:1.0-port0,115200n81,local
  enable: on
  timeout: 18000
  options:
    max-connections: 7
...

Affected computer uses Ubuntu 24.04.2, CPU CORE i3-7100, ser2net version 4.6.5 (compiled from source), but issue was present before that, maybe 4.6.3 or 4.6.4. gensio was slightly out of date, but updating it did not help, coredump is nearly identical, only change is serconf_call_cdone (libgensio_serialdev.so + 0x38ab)

Another computer with same OS, but different CPU (CORE i5-9600), has no problems running it for 4 days now, ser2net v4.6.5 commit 7189a5d, gensio v2.8.14 commit 0b478936 .

As a side note, it would be nice if version args would be the same for ser2net and gensio :D gensio uses --version but ser2net uses -v

ffolkes1911 avatar Jun 05 '25 11:06 ffolkes1911

Well that wasn't very defensive programming...

I wasn't checking for errors in one place. I'm assuming that the particular serial port you are using on the failing computer cannot do some specific operation, some rfc2217 operation is being done to perform that operation, and it fails. The handler doesn't check the error code and tries to decode the value.

I've pushed up a patch to github to add a log for that; that should solve the crash and help track down what is going on.

cminyard avatar Jun 05 '25 12:06 cminyard

The updated ser2net has been running for 3 days with no segfaults, that particular problem is fixed. Now I see in logs ser_control_set: Error setting ser2net control 7: Internal I/O error (or control 8), I think those are:

# https://www.rfc-editor.org/rfc/rfc2217.html  

3. Special Com Port Control Commands
           Value      Control Commands
             7           Request DTR Signal State
             8           Set DTR Signal State ON

which should indicate problems with my USB FTDI devices? Would be nice to know which device was producing this error

ffolkes1911 avatar Jun 12 '25 07:06 ffolkes1911

The updated ser2net has been running for 3 days with no segfaults, that particular problem is fixed. Now I see in logs ser_control_set: Error setting ser2net control 7: Internal I/O error (or control 8), I think those are:

# https://www.rfc-editor.org/rfc/rfc2217.html  

3. Special Com Port Control Commands
           Value      Control Commands
             7           Request DTR Signal State
             8           Set DTR Signal State ON

Ok, that's definitely the problem, but that's not the value in question. That number is an internal value, and it mean setting the DTR (7) or RTS(8) lines.

I added some code to print a more useful message and not internal numbers.

which should indicate problems with my USB FTDI devices? Would be nice to know which device was producing this error

I've modified the log to print the port's name and fixed the same issue in another place. That should help future users.

Now to the base issue... The base code in gensio where the error is returned is:

	if (ioctl(fd, TIOCMGET, &nval) == -1)
	    return gensio_os_err_to_err(o, errno);
	if (get) {
	    *((int *) val) = !!(nval & TIOCM_DTR);
	} else {
	    if (val)
		nval |= TIOCM_DTR;
	    else
		nval &= ~TIOCM_DTR;
	    if (ioctl(fd, TIOCMSET, &nval) == -1)
		return gensio_os_err_to_err(o, errno);
	}

Either the first (fetching the current value) or second (setting the value) ioctl() operation is returning an EIO error, which just means an unspecified internal I/O error. I don't see any other way to get this particular error on this code path, so it appears that the problem is with your serial device or device driver.

The way things work, it's hard to know if this is a get or set operation at this point in the code. That information is not kept after the operation is issued.

I assume it's not issuing any kernel logs, so tracing this down would be hard. I would assume this is a bug in the actual device, not in the kernel driver. It could also be an issue with a bad cable, a bad hub, or something else like that. That seems less likely; you would see it on data transfer operations, too, in that case.

It may be that two operations on the device come in at the same time and the device gives an error on the second one. That's still a bug in the driver or the device. Tracing could be added to see if this is going on.

Sometimes you can upgrade the firmware on those devices.

cminyard avatar Jun 12 '25 13:06 cminyard

Thank you, I will update ser2net on monday to avoid risking problems over weekend. Funny thing is, that particular computer should barely receive data from serial devices, yet it has the most problems.

ffolkes1911 avatar Jun 13 '25 05:06 ffolkes1911

Any news on this?

cminyard avatar Jun 26 '25 17:06 cminyard

Sorry about the delay, I did update ser2net, and was waiting to reproduce the error, so I could confirm that debug was added, however I have not seen a single error message on that machine...

On another note, I noticed that your changes only went to sourceforge. Was that intended?

ffolkes1911 avatar Jun 27 '25 06:06 ffolkes1911

Sorry about the delay, I did update ser2net, and was waiting to reproduce the error, so I could confirm that debug was added, however I have not seen a single error message on that machine...

Well of course, the bug knows we are looking....

On another note, I noticed that your changes only went to sourceforge. Was that intended?

No, my bad. It's pushed to github now. I need to automate that.

cminyard avatar Jun 27 '25 11:06 cminyard

Sorry about the delay, seems that the bug is still hiding, but the segfault is fixed and I have not seen any regression, so I'll close the issue.

ffolkes1911 avatar Aug 20 '25 04:08 ffolkes1911