mdsplus icon indicating copy to clipboard operation
mdsplus copied to clipboard

Segfault when writing an empty array to MDSplus

Open margomw opened this issue 4 months ago • 5 comments

Affiliation General Atomics

Version(s) Affected Client version 7.153 Server version 7.153

Platform(s) Linux Centos 7

Installation Method(s) MDSplus.org repository as rpms

Describe the bug MDSput2 segfaulted when writing an array of zero length. Make the code more robust and handle the error more gracefully.

To Reproduce Steps to reproduce the behavior: Write a C client of MDS Write a empty array to mds using mdsput2() function (part of MDS client)

Expected behavior Error condition or if the behavior is valid, allow the request to proceed

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Core file

Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/data/pcshome/margomw/git/d3d/INTEL64_DIR/host_cpu1 STANDALONE d3pcs3 11234 d3p'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f7cd6d03720 in __memcpy_ssse3_back () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f7cd6d03720 in __memcpy_ssse3_back () from /lib64/libc.so.6
#1  0x00007f7cd8020ddb in copy_dx () from /usr/local/mdsplus/lib/libMdsShr.so
#2  0x00007f7cd8021abf in MdsCopyDxXdZ () from /usr/local/mdsplus/lib/libMdsShr.so
#3  0x00007f7cd697a44f in tdi_yacc_ARG () from /usr/local/mdsplus/lib/libTdiShr.so
#4  0x00007f7cd697bf15 in tdi_yacc () from /usr/local/mdsplus/lib/libTdiShr.so
#5  0x00007f7cd68814f5 in compile () from /usr/local/mdsplus/lib/libTdiShr.so
#6  0x00007f7cd6881a67 in Tdi1Compile () from /usr/local/mdsplus/lib/libTdiShr.so
#7  0x00007f7cd693dd7f in TdiIntrinsic () from /usr/local/mdsplus/lib/libTdiShr.so
#8  0x00007f7cd68aafc8 in TdiCompile () from /usr/local/mdsplus/lib/libTdiShr.so
#9  0x00007f7cd823ae3e in mds_put2_vargs () from /usr/local/mdsplus/lib/libMdsLib.so
#10 0x00007f7cd823dfd8 in mdsput2_ () from /usr/local/mdsplus/lib/libMdsLib.so

From the GA code

#11 0x000000000043171c in kTD_mdsPutBasic (d1=<optimized out>, val=0x0, units=0x0, node=0x7ffdb59a9d00 "PCS.PHASESTK1:VHASCII")
    at ppplLib/kToolData_mds.c:155
#12 kTD_mdsPutMatrix (dtype=6, matrix=0x0, dims=<synthetic pointer>, nDims=1, units=0x0, node=0x7ffdb59a9d00 "PCS.PHASESTK1:VHASCII")
    at ppplLib/kToolData_mds.c:163
#13 kToolData_mdsPutArrayInt8 (n=n@entry=0x7ffdb59a9d00 "PCS.PHASESTK1:VHASCII", u=u@entry=0x0, sz=sz@entry=0, v=v@entry=0x0)
    at ppplLib/kToolData_mds.c:178
#14 0x000000000042e65f in writePfiMds (ptname=ptname@entry=0x7ffdb59aa390 "PHASESTK1", pthead=pthead@entry=0x7ffdb59aa2a0)

margomw avatar Sep 09 '25 18:09 margomw

This except is from @margomw's email of 9-Sep-2025.

We ran into this bug that causes mdsput2_ to seg fault. when writing an empty array. 
The workaround we needed to implement is

if (narray > 0 ) kTD_mdsPutMatrix(bla bla);

if narray happens to be 0, kTD_mdsPutMatrix() failed.

mwinkel-dev avatar Sep 16 '25 20:09 mwinkel-dev

Hi @margomw,

In the above example, is narray actually the array? Or is it the number of elements in some array?

A conjecture (perhaps wrong) is that GA's program has a pointer that normally points to a non-empty array. But when the array is empty, that the pointer is NULL (aka zero). The rest of this post assumes that conjecture is true.

Note that descr2() and MdsPut2() are variadic functions that are NULL terminated. Thus, passing an array pointer that is zero is treated as the terminating NULL that indicates the end of the argument list.

Stated another way, there is a loop in the MDSplus code that uses C's va_start(), va_arg(), and va_end() macros to process the varying number of arguments. An argument that is supposed to point to an array, but instead is NULL, causes an abnormal early termination of the loop, thus not all arguments are processed, causing MdsPut2() to segfault.

The segfault triggered by the NULL array pointer has been reproduced, and is expected.

The workaround you described above is good, so should be used.

The va_arg() processing occurs in the mds_put2_vargs() function of the MdsLib.c file. Note that MdsPut2() can accept complicated expressions that involve many terms. That makes this issue complex.

For example, an expression that had 32 terms would require mds_put2_vargs() to process 32 descriptors and 32 data pointers (plus the NULL terminator). If one of those 32 terms has a NULL data pointer, what should be done? Should it always ignore that term (thus hiding errors from the user)? Should it segfault? Should it throw a special exception? Other? If the mds_put2_vargs() code is changed, how will it affect the entire customer base? What is the likelihood that modifying this code would inadvertently introduce new bugs?

In summary, the core developers must discuss this issue before any changes are made to MdsPut2() and/or mds_put2_vargs().

CORRECTION:. This post is only partially correct. Refer to posts below for more details.

mwinkel-dev avatar Sep 16 '25 21:09 mwinkel-dev

@mwinkel-dev : Mark, nwsacii / narray is an integer primitive type int

margomw avatar Sep 18 '25 16:09 margomw

Hi @margomw,

Thanks for explaining that narray is an integer that gives the array size.

Note that the explanation given two posts above is not entirely correct. To explain the issue, consider this example.

int dtype_long = DTYPE_LONG;
int zero = 0;
int mydata[5] = {1, 2, 3, 4, 5};

//  A) This writes [1, 2, 3] to the array1 node
int dim1 = 3;
int idesc = descr2(&dtype_long, &dim1, &zero);
int *array = mydata;
MdsPut2("array1", " $ ", &idesc, array, &zero);

// B) This writes a one element array of garbage to the node.
dim1 = 1;
idesc = descr2(&dtype_long, &dim1, &zero);
array = NULL;
MdsPut2("array1", " $ ", &idesc, array, &zero);

// C) This writes [1] to the node
dim1 = 0;
idesc = descr2(&dtype_long, &dim1, &zero);
array = mydata;
MdsPut2("array1", " $ ", &idesc, array, &zero);

// D) This segfaults
dim1 = 0;
idesc = descr2(&dtype_long, &dim1, &zero);
array = NULL;
MdsPut2("array1", " $ ", &idesc, array, &zero);  

Please check the call to MdsPut2() in GA's program (when the array is empty) and confirm that it is a per case D) above. Namely that 1) the descriptor's dimension is zero and 2) that the array pointer is NULL. That is surely the situation. (But if not, it would mean that there is a case E that also has to be investigated.)

Experiments will now be done to see if a minimal change can detect the NULL data pointer passed to MdsPut2() and throw an exception instead of triggering a segfault.

mwinkel-dev avatar Sep 19 '25 20:09 mwinkel-dev

Similar segfaults also occur with the MdsPut() function.

// This writes [1, 2, 3] to the array1 node
dim1 = 3;
array = mydata;
idesc = descr(&dtype_long, array, &dim1, &zero);
MdsPut("array1", " $ ", &idesc, &zero);

// This segfaults
dim1 = 1;
array = NULL;
idesc = descr2(&dtype_long, array, &dim1, &zero);
MdsPut("array1", " $ ", &idesc, &zero);

// This writes a one elment array of garbage
dim1 = 0;
array = mydata;
idesc = descr2(&dtype_long, array, &dim1, &zero);
MdsPut("array1", " $ ", &idesc, &zero);

// This segfaults
dim1 = 0;
array = NULL;
idesc = descr2(&dtype_long, array, &dim1, &zero);
MdsPut("array1", " $ ", &idesc, &zero);   

mwinkel-dev avatar Sep 19 '25 22:09 mwinkel-dev