pacemaker Unit tests for some env option functions

Closes T493.

Aug 15 '22 10:08 nrwahl2

Build failure is just fedora rawhide RPM

Aug 15 '22 21:08 nrwahl2

The related PRs have been merged, this can be rebased and updated

Aug 17 '22 20:08 kgaillot

@clumens , could you review this one when you get a chance? Thanks

Aug 18 '22 15:08 kgaillot

Updated to address review.

Check snprintf() for negative return value and respond as follows (https://github.com/ClusterLabs/pacemaker/pull/2801#discussion_r949286427):
- Log an error message.
- Return NULL from pcmk__env_option().
- Return from pcmk__set_env_option(). This function should probably be updated to return a standard Pacemaker return code, and its callers (at least some) should check the return code. That's outside the scope of this PR.
Mock snprintf() and include tests for the above cases.
Simplify tests disabled_null() test for pcmk__env_option_enabled(). Only use one input condition that causes pcmk__env_option() to return NULL. (https://github.com/ClusterLabs/pacemaker/pull/2801#discussion_r949292623)
Use PCMK__NELEM() in loop control. (https://github.com/ClusterLabs/pacemaker/pull/2801#discussion_r949339751)
Use separate mock booleans for getenv(), setenv(), unsetenv(). (https://github.com/ClusterLabs/pacemaker/pull/2801#discussion_r949354714)
Allow mocked setenv() and unsetenv() to set errno based on will_return(). (https://github.com/ClusterLabs/pacemaker/pull/2801#discussion_r949356975)

Aug 21 '22 02:08 nrwahl2

Really strange test errors... in both Ubuntus and in F36 Power9:

snprintf_error() in pcmk__env_option_test.c results in a call to getenv(), even though pcmk__env_option() returns before calling getenv() when snprintf() returns a negative value.

19:59:04 not ok 2 - snprintf_error
19:59:04 FAIL: pcmk__env_option_test 2 - snprintf_error
19:59:04 # No entries for symbol __wrap_getenv.
19:59:04 # mock.c:90: error: Could not get value to check parameter name of function __wrap_getenv
19:59:04 # There were no previously declared parameter values for this test.

snprintf_error() in pcmk__set_env_option_test.c does not result in any call to snprintf(), even though pcmk__set_env_option() explicitly calls snprintf().

19:59:04 FAIL: pcmk__set_env_option_test 2 - snprintf_error
19:59:04 # %s() has remaining non-returned values.
19:59:04 # pcmk__set_env_option_test.c:49: note: remaining item was declared here
19:59:04 # __wrap_snprintf: '%s' parameter still has values that haven't been checked.
19:59:04 # pcmk__set_env_option_test.c:46: note: remaining item was declared here
19:59:04 # '%s' parameter still has values that haven't been checked.
19:59:04 # pcmk__set_env_option_test.c:47: note: remaining item was declared here
19:59:04 # '%s' parameter still has values that haven't been checked.
19:59:04 # pcmk__set_env_option_test.c:48: note: remaining item was declared here

All the other distros work fine. It's not an endianness issue, since the affected systems are x86_64. Things I've tried:

Removing restrict from the parameters in the __wrap_snprintf() signature, to make sure it doesn't mess with the calls to vsnprintf().
Set up an Ubuntu 22.04 LTS VM and confirmed that __wrap_snprintf() never gets called. Apparently it's ignoring the wrapper completely.
I found this, but the solution doesn't help; I'm already passing arguments after the format string: https://stackoverflow.com/a/39189436/7660197

Edit: It is a compiler optimization issue... setting -O0 in CFLAGS makes it go away. It looks like on Ubuntu, gcc is replacing snprint() with __builtin___snprintf_chk(). That's fine and dandy except that -fno-builtin is not making it go away like it should. I haven't found any answers yet. Ubuntu uses GCC v11 and Fedora 36 uses GCC v12. I don't know what versions our other test machines use but I'm sure the EL7 machines are on older versions. I reached out to the gcc-help mailing list. We could always skip testing that code path for now, but we need to test it eventually, which will require mocking snprintf().

Edit: Adding -U_FORTIFY_SOURCE to CFLAGS fixed the Ubuntu issue. Added a commit to this PR to do that. F36 Power9 LE is still failing with the same errors listed above, and I don't have a reproducer system for that currently.

Aug 21 '22 19:08 nrwahl2

I got my hands on F36 ppc64le. This change to add 128-bit floating point is what breaks the "snprintf_error" unit tests there: https://fedoraproject.org/wiki/Changes/PPC64LE_Float128_Transition

Running gdb on a minimal reproducer:

Breakpoint 1, func (s=s@entry=0x10020020 "hello world") at test.c:5
5	    snprintf(buf, 16, "%s", s);
(gdb) s
0x00007ffff7d54a98 in ___ieee128_snprintf (s=0x7fffffffeac0 "\360\352\377\377\377\177", maxlen=16, format=0x10020010 "%s") at ../sysdeps/ieee754/ldbl-128ibm-compat/ieee128-snprintf.c:24

Special cases and hidden overrides like this may keep popping up for snprintf() in the future. Maybe use a different approach for mocking stuff like Xprintf?

I think I should remove the snprintf() mocking code and the two unit tests that use it tomorrow, and postpone that to another day. I've come up with some solutions but I need to play with it a little more. Everything feels kind of hacky.

Fabio pointed out another approach that was used in libqb that might work: https://github.com/ClusterLabs/libqb/pull/231/commits/f610b1b1. Note _failure_injection.c. I haven't tried it yet.

Aug 22 '22 05:08 nrwahl2

Removed the -U_FORTIFY_SOURCE and __wrap_snprintf() commits from this PR. Mocking snprintf() will require a different approach or a hack, so it's better addressed separately. I also removed the snprintf_error() test cases from the pcmk__env_option() and pcmk__set_env_option() unit test files. We can add those back when snprintf() is taken care of.

As of right now, __wrap_setenv() and __wrap_unsetenv() still don't call __real_X, based on my comment in https://github.com/ClusterLabs/pacemaker/pull/2801#discussion_r951846421 about symmetry and the lack of need to use the real environment (we have a mocked getenv() we can use). If either you or @clumens feels strongly about calling __real_X, I can easily update both __wrap_X functions to make that call.

Aug 22 '22 20:08 nrwahl2