varnish-cache icon indicating copy to clipboard operation
varnish-cache copied to clipboard

Backtrace improvements for utilities

Open asadsa92 opened this issue 1 year ago • 1 comments

This PR adds a default back-trace to all asserts outside of varnishd, e.g., varnishstat. Also add back-trace to varnishtest asserts.

Example (artificially crafted asserts):


Before:

Assert error in do_once_cb_first(), varnishstat.c line 164:
  Condition((0) != 0) not true.
Aborted

After:

Assert error in do_once_cb_first(), varnishstat.c line 164:
  Condition((0) != 0) not true.
Backtrace:
  ip=0x7f3458bb6125 sp=0x7fff606013e0 <VBT_format+0x15>
  ip=0x7f3458bb66eb sp=0x7fff60601400 <VBT_dump+0x6b>
  ip=0x7f3458bb5394 sp=0x7fff60601450 <vas_default+0x1b4>
  ip=0x7f3458bb51d3 sp=0x7fff606024c0 <VAS_Fail+0x63>
  ip=0x5614ef51548a sp=0x7fff60602500 <do_once_cb_first+0x3a>
  ip=0x7f3458b963d3 sp=0x7fff60602540 <vsc_iter_seg+0x1c3>
  ip=0x7f3458b957bb sp=0x7fff60602590 <VSC_Iter+0x34b>
  ip=0x5614ef51514e sp=0x7fff60602610 <do_once+0xde>
  ip=0x5614ef514a8d sp=0x7fff60602650 <main+0x5ed>
  ip=0x7f345887124a sp=0x7fff606026c0 <__libc_init_first+0x8a>
  ip=0x7f3458871305 sp=0x7fff60602760 <__libc_start_main+0x85>
  ip=0x5614ef5143d1 sp=0x7fff606027b0 <_start+0x21>
Aborted

Before:

*    top   RESETTING after bin/varnishtest/tests/p00000.vtc
**** dT    0.472
**   s1    Waiting for server (5/-1)
**   v1    Wait
**** dT    0.474
---- v1    Assert error in varnish_ask_cli(), vtc_varnish.c line 110:  Condition((0) != 0) not true.  Errno=0 Success
*    top   failure during reset
#    top  TEST bin/varnishtest/tests/p00000.vtc FAILED (0.474) exit=2

After

*    top   RESETTING after bin/varnishtest/tests/p00000.vtc
**** dT    0.472
**   s1    Waiting for server (5/-1)
**   v1    Wait
**** dT    0.474
---- v1    Assert error in varnish_ask_cli(), vtc_varnish.c line 110:  Condition((0) != 0) not true.  Errno=0 Success
Backtrace:
  ip=0x55f0c01f3845 sp=0x7ffd49fc2480 <VBT_format+0x15>
  ip=0x55f0c01f3e0b sp=0x7ffd49fc24a0 <VBT_dump+0x6b>
  ip=0x55f0c01d453f sp=0x7ffd49fc24f0 <vtc_log_VAS_Fail+0x4f>
  ip=0x7fec618c41bb sp=0x7ffd49fc35c0 <VAS_Fail+0x4b>
  ip=0x55f0c01f043e sp=0x7ffd49fc3600 <varnish_ask_cli+0x3e>
  ip=0x55f0c01ee07d sp=0x7ffd49fc3640 <varnish_wait+0x7d>
  ip=0x55f0c01ed132 sp=0x7ffd49fc3660 <cmd_varnish+0x72>
  ip=0x55f0c01b0ddb sp=0x7ffd49fc36d0 <reset_cmds+0x2b>
  ip=0x55f0c01b0d13 sp=0x7ffd49fc36f0 <fail_out+0xa3>
  ip=0x55f0c01d38df sp=0x7ffd49fc3710 <vtc_logfail+0x3f>
  ip=0x55f0c01d3548 sp=0x7ffd49fc3720 <vtc_fatal+0x278>
  ip=0x55f0c01d4707 sp=0x7ffd49fc3830 <vtc_log_VAS_Fail+0x217>
  ip=0x7fec618c41bb sp=0x7ffd49fc4900 <VAS_Fail+0x4b>
  ip=0x55f0c01f043e sp=0x7ffd49fc4940 <varnish_ask_cli+0x3e>
  ip=0x55f0c01f1460 sp=0x7ffd49fc4980 <varnish_launch+0x1020>
  ip=0x55f0c01efcaa sp=0x7ffd49fc5aa0 <varnish_vclbackend+0xaa>
  ip=0x55f0c01ede7f sp=0x7ffd49fc5b00 <cmd_varnish+0xdbf>
  ip=0x55f0c01b0c43 sp=0x7ffd49fc5b70 <parse_string+0xc43>
  ip=0x55f0c01b12c3 sp=0x7ffd49fc6870 <exec_file+0x4c3>
  ip=0x55f0c01dbc01 sp=0x7ffd49fc68d0 <start_test+0x671>
  ip=0x55f0c01d9ebb sp=0x7ffd49fc7940 <main+0x70b>
  ip=0x7fec6151a24a sp=0x7ffd49fc89b0 <__libc_init_first+0x8a>
  ip=0x7fec6151a305 sp=0x7ffd49fc8a50 <__libc_start_main+0x85>
  ip=0x55f0c01a8eb1 sp=0x7ffd49fc8aa0 <_start+0x21>


*    top   failure during reset
#    top  TEST bin/varnishtest/tests/p00000.vtc FAILED (0.474) exit=2

asadsa92 avatar Jun 05 '24 14:06 asadsa92

bugwash: we should enable back-traces outside of varnishd with some kind of developer mode set in varnishd and exposed to VSM readers with a new Arg segment or something similar.

The developer mode should always be assumed by varnishtest (which is much more than a VSM reader) and as an intermediate measure this could be enabled at configure time and always enabled by autogen.des.

dridi avatar Jun 10 '24 13:06 dridi

Bumping this. I assume this can be merged after a rebase? @dridi I also assume you meant "developer mode" in varnishd can be done later?

asadsa92 avatar Nov 06 '24 09:11 asadsa92

I think @bsdphk insisted that we do not make this the default, that it should be controlled in varnishd but we could fall back to a configure-time option (eg. --enable-vut-bt) until we introduce this developer mode.

So I don't think it was acceptable as-is.

dridi avatar Nov 06 '24 10:11 dridi

@bsdphk @dridi I do not understand what the point would be about not printing the backtrace. Why would we need VSM complications just to not enable printing it?

nigoroll avatar Feb 12 '25 14:02 nigoroll

I don't remember why @bsdphk didn't want VUTs to give a back-trace by default and only add them to varnishtest. I hate to agree with @gquintard [1] but in the VUT case it might be better to make back-traces opt-in or opt-out with environment variables.

[1] random shot fired

dridi avatar Feb 12 '25 16:02 dridi

in the VUT case it might be better to make back-traces opt-in or opt-out with environment variables.

I agree with "if configurable, then through environment variables".

But my question remains what the worry is about having backtraces.

Thinking about it I realize that being able to turn the abort handler off has in fact been helpful in the past (I am having trouble though remembering exactly why, but it had something to do with getting a proper backtrace in a debugger (might have been on Solaris?) for just the abort from a SIGSEGV and not if the handler was active). So I think that would be more important than just turning the backtrace off.

nigoroll avatar Feb 12 '25 16:02 nigoroll

bugwash: send to syslog (suggested by phk), enable always

nigoroll avatar Feb 17 '25 14:02 nigoroll

@asadsa92 is this ready? If yes, would you please rebase+squash?

nigoroll avatar Mar 26 '25 16:03 nigoroll

@asadsa92 is this ready? If yes, would you please rebase+squash?

Yes, thanks for the reminder, it is now done.

asadsa92 avatar Mar 27 '25 08:03 asadsa92

bugwash: send to syslog (suggested by phk), enable always

Do we still need to do anything to send to syslog?

Can we please have a test case?

nigoroll avatar Apr 19 '25 11:04 nigoroll

bugwash: send to syslog (suggested by phk), enable always

Do we still need to do anything to send to syslog?

What has been done is to send VUT clients backtrace to syslog, see https://github.com/varnishcache/varnish-cache/pull/4116/commits/9770efd941c2c069c8eccb264803a317e4291d7e

@dridi argued that varnishtest should always print out the backtrace inside of the test case, see https://github.com/varnishcache/varnish-cache/commit/5f13c16f74d44daa4a5eb18eb451046998ed1a89#r154237979 Thus, I reverted that commit for varnishtest.

Can we please have a test case?

I have given this a try, but it seems tricky:

  • For VUT clients, the situation is such that we currently do not invoke VAS_Fail when we receive signals, e.g., from the kill command. Therefore it becomes tricky to crash the VUT client on-demand. I am not sure if it is intentional that we do not call the VAS_Fail for signals received by the VUT client like we do for cache-main, but changing this was deemed not appropriate here. I have tested manually triggering the crash using asserts and that send the expected backtrace to syslog.
  • For varnishtest, we would need a new command in the HTTP spec? Then, somehow verify the backtrace from the main thread? or invoke yet another varnishtest inside of the test case that calls abort and then catch that from the outside?

asadsa92 avatar Apr 22 '25 12:04 asadsa92

The reason I dont want them on by default is that the back-trace will be piped into people's automation.

bsdphk avatar May 14 '25 13:05 bsdphk

Intwash:

  • Last call until Monday.
  • Rebase

The rebase is now complete. A small change I did was to assume buf to VBT_dump() can not be NULL: https://github.com/varnishcache/varnish-cache/pull/4116/commits/9b90f2bdec567ee55a5eba65332abc7b9d5e2f91#diff-0e7e331483876bba2515b29b3712c8a835a564458815d1c9bd358b68e67cd3d4R166 The reason for this change was that this function only make sense when using fixed size buffers. VSB_init will assert that buf is not NULL

asadsa92 avatar May 14 '25 14:05 asadsa92