httpd icon indicating copy to clipboard operation
httpd copied to clipboard

Add ap_get_mon_snap(ap_mon_snap_t *ms) to retrieve some averaged monitoring data from the server.

Open rainerjung opened this issue 5 years ago • 4 comments

The data will be updated during the monitor hook, ie. every 10 seconds.

Consumers are mod_systemd and mod_status.

More details:

Patch available at

home.apache.org/~rjung/patches/httpd-trunk-mon-snaps-v1_2.patch

Some notes:

  • in order to use the data from mod_systemd (monitor hook), which runs in the maim process, and also from mod_status, so from child processes, it needs to sit in shared memory. Since most of the data is tightly coupled to the scoreboard, I added the new cached data to the global part of the scoreboard.

  • it was then IMHO a good fit to move the existing ap_get_sload(ap_sload_t *ld) from server/util.c to server/scoreboard.c.

  • ap_sload_t is used as a collection of data which can be used to generate averaged data from a pair of ap_sload_t structs. It was extended to also contain the total request duration plus total usr and sys CPU and the timestamp when the data was taken. So it should now contain all data from which mod-systemd and mod_status currently like to display derived averaged values.

  • busy and idle in ap_sload_t have been changed from int to float, because they were already only used a percentages. IMHO it doesn't make sense to express idle and busy as percentages based on a total of busy plus idle, because that sum is not a meaningful total. The server can grow by creating new processes and suddenly your busy percentage might shrink, although the absolute number of busy threads increases. That's counter intuitive. So I added the unused process slots to the total count, and we have three counters that sum up to this total, busy, idle and dead. We need a better name than "dead" for these unused process slots that might get used later. "unused" is to close to "idle" and I don't have a good name yet.

  • the new ap_mon_snap_t contains a pointer to an ap_sload_t, six averaged values generated from two ap_sload_t and a member that conatins the time delta between those two ap_sload_t. The ap_sload_t referenced by ap_mon_snap_t contains the data at time t1. During the next monitor run, new t1 data will be collected and the previous t1 data will be used as t0 data to generate the new averages.

  • the scoreboard contains two ap_mon_snap_t (plus the two ap_sload_t referenced by them) to be able to update one of them without breaking access by consumers to the other one. After the update the roles get switched.

  • both structs, ap_sload_t and ap_mon_snap_t are declared in httpd.h, because ap_sload_t already had been there. t might be a better fit to move them to scoreboard.h, but I'm not sure about that and I don't know if that would be allowed by the compatibility rules.

  • also in httpd.h there are now three new function declarations. Two only used by server/core.c as hook functions:

    int ap_scoreboard_monitor(apr_pool_t *p, server_rec *s); void ap_scoreboard_child_init(apr_pool_t *p, server_rec *s);

and one for public consumption:

AP_DECLARE(void) ap_get_mon_snap(ap_mon_snap_t *ms);

  • mod_systemd and mod_status now call ap_get_mon_snap(&snap) to retrieve the latest averaged data. mod_status still uses the scoreboard in addition to retrieve up-to-date current scalar metrics. Small adjustments to the mod_status output plus additions to mod_systemd notification messages. Tne STATUS in the notification of mod_systemd now looks liek this:

STATUS=Pct Idle workers 28.4; Pct Busy workers 10.6; Pct Dead workers 60.9; Requests/sec: 5030; Bytes served/sec: 2.9MB/sec; Bytes served/request: 596 B/req; Avg Duration(ms): 5.78; Avg Concurrency: 29.1; Cpu Pct: 40.5

  • scoreboard.c changes:

    • take over ap_get_sload(ap_sload_t *sl) from server/util.c. Added copied code from mod_status to that function to also add up the request duration and the usr and sys CPU times.

    • ap_scoreboard_monitor() runs in the monitor hook. It calls ap_get_sload() and then a static utility function calc_mon_data() to calculate the new averages.

  • Minor mmn change not yet part of the patch.

It compiles fine (maintainer mode) on RHEL 7 x86_64 and on Solaris 10 Sparc and I did some tests with mod_status and mod_systemd.

rainerjung avatar Apr 27 '20 11:04 rainerjung

Open questions:

  • Any tricks how we can ensure, that ap_scoreboard_monitor() is only called by the core and not by modules? Is putting it into a private header file plus not using AP_DECLARE enough? For the moment I have documented it in the header file, like we do for two functions in httpd_log.h that are only meant for core use.
  • Any better name than "dead" for the slots, that are neither busy nor idle (and instead offer more headroom for the server to grow, so some kind of super-idle slots)? Or simply ignore idle and dead and use MaxRequestWorkers as the total number, calculate busy and let idle be the compliment? That would ignore hanging processes though ...
  • Check history of times_per_thread. Do we still need it? Does it work? See PR23795. It seems to have been useful and worked at some point (2003) for SGI Irix and probably a Linux thread implementation, that was already old in 2003. Maybe remove in trunk and keep for 2.4.x?

rainerjung avatar Apr 27 '20 17:04 rainerjung

Rainer, how about embedding sload in snap like the below, and thus switching between snap0 and snap1 in global_score without the need of sload0 and sload1?

diff --git a/include/httpd.h b/include/httpd.h
index 741eecd9c0..a200fd28d0 100644
--- a/include/httpd.h
+++ b/include/httpd.h
@@ -1438,6 +1438,7 @@ struct ap_sload_t {
  */
 typedef struct ap_mon_snap_t ap_mon_snap_t;
 struct ap_mon_snap_t {
+    ap_sload_t sload;
     /* averaging interval */
     apr_interval_time_t interval;
     /* accesses per second */
@@ -2538,7 +2539,7 @@ void ap_scoreboard_child_init(apr_pool_t *p, server_rec *s);
  * @param sl struct to populate with last absolute data;
  * NULL allowed, to ignore those.
  */
-AP_DECLARE(void) ap_get_mon_snap(ap_mon_snap_t *ms, ap_sload_t *sl)
+AP_DECLARE(void) ap_get_mon_snap(ap_mon_snap_t *ms)
                  AP_FN_ATTR_NONNULL((1));
 
 /**
diff --git a/include/scoreboard.h b/include/scoreboard.h
index 5088aea85b..90bcbb01be 100644
--- a/include/scoreboard.h
+++ b/include/scoreboard.h
@@ -131,8 +131,6 @@ typedef struct {
     struct tms times;
 #endif
     apr_uint32_t volatile snap_index;
-    ap_sload_t sload0;
-    ap_sload_t sload1;
     ap_mon_snap_t snap0;
     ap_mon_snap_t snap1;
 } global_score;
diff --git a/modules/arch/unix/mod_systemd.c b/modules/arch/unix/mod_systemd.c
index a57fe9f9fa..b7a533bce8 100644
--- a/modules/arch/unix/mod_systemd.c
+++ b/modules/arch/unix/mod_systemd.c
@@ -69,7 +69,6 @@ static int systemd_pre_mpm(apr_pool_t *p, ap_scoreboard_e sb_type)
 
 static int systemd_monitor(apr_pool_t *p, server_rec *s)
 {
-    ap_sload_t sload;
     ap_mon_snap_t snap;
     char bps[5];
     char bpr[5];
@@ -79,7 +78,7 @@ static int systemd_monitor(apr_pool_t *p, server_rec *s)
         return DECLINED;
     }
     
-    ap_get_mon_snap(&snap, &sload);
+    ap_get_mon_snap(&snap);
     apr_strfsize((apr_off_t)snap.bytes_per_sec, bps);
     apr_strfsize((apr_off_t)snap.bytes_per_acc, bpr);
     sd_notifyf(0, "READY=1\n"
@@ -89,8 +88,8 @@ static int systemd_monitor(apr_pool_t *p, server_rec *s)
                "Bytes served/request: %sB/req; "
                "Avg Duration(ms): %.3g; Avg Concurrency: %.3g; "
                "Cpu Pct: %.3g\n",
-               sload.idle, sload.busy,
-               sload.dead,
+               snap.sload.idle, snap.sload.busy,
+               snap.sload.dead,
                snap.acc_per_sec, bps, bpr,
                snap.ms_per_acc,
                snap.average_concurrency,
diff --git a/modules/generators/mod_status.c b/modules/generators/mod_status.c
index 39e3477ddf..32a406990c 100644
--- a/modules/generators/mod_status.c
+++ b/modules/generators/mod_status.c
@@ -212,7 +212,6 @@ static int status_handler(request_rec *r)
     float tick;
     int times_per_thread;
 #endif
-    ap_sload_t sload;
     ap_mon_snap_t snap;
 
     if (strcmp(r->handler, STATUS_MAGIC_TYPE) && strcmp(r->handler,
@@ -257,7 +256,7 @@ static int status_handler(request_rec *r)
         thread_busy_buffer = apr_palloc(r->pool, server_limit * sizeof(int));
     }
 
-    ap_get_mon_snap(&snap, &sload);
+    ap_get_mon_snap(&snap);
 
     nowtime = apr_time_now();
 #ifdef HAVE_TIMES
@@ -498,15 +497,15 @@ static int status_handler(request_rec *r)
 
             ap_rprintf(r, "Uptime: %ld\n", (long) (up_time));
             ap_rvputs(r, "LastSnap: ",
-                      ap_ht_time(r->pool, sload.timestamp, DEFAULT_TIME_FORMAT, 0),
+                      ap_ht_time(r->pool, snap.sload.timestamp, DEFAULT_TIME_FORMAT, 0),
                       "\n", NULL);
             ap_rprintf(r, "LastSnapEpoch: %" APR_TIME_T_FMT "\n",
-                       apr_time_as_msec(sload.timestamp));
+                       apr_time_as_msec(snap.sload.timestamp));
             ap_rprintf(r, "SnapInterval: %" APR_TIME_T_FMT "\n",
                        apr_time_as_msec(snap.interval));
-            ap_rprintf(r, "BusyPct: %d\n", sload.busy);
-            ap_rprintf(r, "IdlePct: %d\n", sload.idle);
-            ap_rprintf(r, "DeadPct: %d\n", sload.dead);
+            ap_rprintf(r, "BusyPct: %d\n", snap.sload.busy);
+            ap_rprintf(r, "IdlePct: %d\n", snap.sload.idle);
+            ap_rprintf(r, "DeadPct: %d\n", snap.sload.dead);
             ap_rprintf(r, "ReqPerSec: %.4g\n", snap.acc_per_sec);
             ap_rprintf(r, "BytesPerSec: %.4g\n", snap.bytes_per_sec);
             ap_rprintf(r, "BytesPerReq: %.4g\n", snap.bytes_per_acc);
@@ -528,15 +527,15 @@ static int status_handler(request_rec *r)
 #endif
 
             ap_rvputs(r, "<dt>Last snap ",
-                      ap_ht_time(r->pool, sload.timestamp, DEFAULT_TIME_FORMAT, 0),
+                      ap_ht_time(r->pool, snap.sload.timestamp, DEFAULT_TIME_FORMAT, 0),
                       NULL);
             ap_rprintf(r, " (%" APR_TIME_T_FMT ")",
-                       apr_time_as_msec(sload.timestamp));
+                       apr_time_as_msec(snap.sload.timestamp));
             ap_rprintf(r, ", interval %" APR_TIME_T_FMT "ms</dt>\n",
                        apr_time_as_msec(snap.interval));
-            ap_rprintf(r, "<dt>%d%% busy", sload.busy);
-            ap_rprintf(r, " - %d%% idle", sload.idle);
-            ap_rprintf(r, " - %d%% dead</dt>\n", sload.dead);
+            ap_rprintf(r, "<dt>%d%% busy", snap.sload.busy);
+            ap_rprintf(r, " - %d%% idle", snap.sload.idle);
+            ap_rprintf(r, " - %d%% dead</dt>\n", snap.sload.dead);
 
             ap_rprintf(r, "<dt>%.4g requests/sec - ", snap.acc_per_sec);
             format_byte_out(r, (unsigned long)(snap.bytes_per_sec));
diff --git a/server/scoreboard.c b/server/scoreboard.c
index 9513583559..7a16714a2d 100644
--- a/server/scoreboard.c
+++ b/server/scoreboard.c
@@ -189,8 +189,8 @@ AP_DECLARE(void) ap_init_scoreboard(void *shared_score)
     ap_assert(more_storage == (char*)shared_score + scoreboard_size);
     ap_scoreboard_image->global->server_limit = server_limit;
     ap_scoreboard_image->global->thread_limit = thread_limit;
-    ap_scoreboard_image->global->sload0.access_count = -1;
-    ap_scoreboard_image->global->sload1.access_count = -1;
+    ap_scoreboard_image->global->snap0.sload.access_count = -1;
+    ap_scoreboard_image->global->snap1.sload.access_count = -1;
 }
 
 /**
@@ -837,58 +837,58 @@ AP_DECLARE(void) ap_get_sload(ap_sload_t *sl)
     }
 }
 
-static void calc_mon_data(ap_sload_t *s0, ap_sload_t *s1,
-                          ap_mon_snap_t *snap)
+static void calc_mon_data(ap_mon_snap_t *last, ap_mon_snap_t *next)
 {
+    ap_sload_t *s0 = &last->sload;
+    ap_sload_t *s1 = &next->sload;
     unsigned long accesses;
 #ifdef HAVE_TIMES
     float tick;
 #endif
 
-    snap->acc_per_sec = -1;
-    snap->bytes_per_sec = -1;
-    snap->average_concurrency = -1;
-    snap->cpu_load = -1;
-    snap->bytes_per_acc = -1;
-    snap->ms_per_acc = -1;
-    snap->interval = -1;
+    next->acc_per_sec = -1;
+    next->bytes_per_sec = -1;
+    next->average_concurrency = -1;
+    next->cpu_load = -1;
+    next->bytes_per_acc = -1;
+    next->ms_per_acc = -1;
+    next->interval = -1;
 
     /* Need two iterations for complete data in s0 and s1 */
     if (s0->access_count < 0 || s1->access_count < 0) {
         return;
     }
 
-    snap->interval = (apr_interval_time_t) (s1->timestamp - s0->timestamp);
+    next->interval = (apr_interval_time_t) (s1->timestamp - s0->timestamp);
     accesses = s1->access_count - s0->access_count;
-    if (snap->interval > 0) {
-        snap->acc_per_sec = (float)accesses / snap->interval * APR_USEC_PER_SEC;
-        snap->bytes_per_sec = (float)(s1->bytes_served - s0->bytes_served)
-                              / snap->interval * APR_USEC_PER_SEC;
-        snap->average_concurrency = (float)(s1->duration - s0->duration)
-                                    / snap->interval;
+    if (next->interval > 0) {
+        next->acc_per_sec = (float)accesses / next->interval * APR_USEC_PER_SEC;
+        next->bytes_per_sec = (float)(s1->bytes_served - s0->bytes_served)
+                              / next->interval * APR_USEC_PER_SEC;
+        next->average_concurrency = (float)(s1->duration - s0->duration)
+                                    / next->interval;
 #ifdef HAVE_TIMES
 #ifdef _SC_CLK_TCK
         tick = sysconf(_SC_CLK_TCK);
 #else
         tick = HZ;
 #endif
-        snap->cpu_load = (float)(s1->cpu_usr - s0->cpu_usr
+        next->cpu_load = (float)(s1->cpu_usr - s0->cpu_usr
                                  + s1->cpu_sys - s0->cpu_sys)
-                              / tick / snap->interval * APR_USEC_PER_SEC;
+                              / tick / next->interval * APR_USEC_PER_SEC;
 #endif
     }
     if (accesses > 0) {
-        snap->bytes_per_acc = (float)(s1->bytes_served - s0->bytes_served)
+        next->bytes_per_acc = (float)(s1->bytes_served - s0->bytes_served)
                               / accesses;
-        snap->ms_per_acc = (float)(s1->duration - s0->duration)
+        next->ms_per_acc = (float)(s1->duration - s0->duration)
                            / accesses / 1000.0;
     }
 }
 
 int ap_scoreboard_monitor(apr_pool_t *p, server_rec *s)
 {
-    ap_sload_t *sload_last;
-    ap_sload_t *sload_next;
+    ap_mon_snap_t *snap_last;
     ap_mon_snap_t *snap_next;
     apr_uint32_t index;
 
@@ -898,59 +898,48 @@ int ap_scoreboard_monitor(apr_pool_t *p, server_rec *s)
 
     index = apr_atomic_read32(&ap_scoreboard_image->global->snap_index);
     if (index == 0) {
-        sload_last = &ap_scoreboard_image->global->sload0;
-        sload_next = &ap_scoreboard_image->global->sload1;
+        snap_last = &ap_scoreboard_image->global->snap0;
         snap_next = &ap_scoreboard_image->global->snap1;
         index = 1;
     } else {
-        sload_last = &ap_scoreboard_image->global->sload1;
-        sload_next = &ap_scoreboard_image->global->sload0;
+        snap_last = &ap_scoreboard_image->global->snap1;
         snap_next = &ap_scoreboard_image->global->snap0;
         index = 0;
     }
-    ap_get_sload(sload_next);
-    calc_mon_data(sload_last, sload_next, snap_next);
+    ap_get_sload(&snap_next->sload);
+    calc_mon_data(snap_last, snap_next);
     apr_atomic_set32(&ap_scoreboard_image->global->snap_index, index);
     return DECLINED;
 }
 
-AP_DECLARE(void) ap_get_mon_snap(ap_mon_snap_t *ms, ap_sload_t *sl)
+AP_DECLARE(void) ap_get_mon_snap(ap_mon_snap_t *ms)
 {
-    ap_sload_t *sload;
     ap_mon_snap_t *snap;
     apr_uint32_t index;
 
     if (ap_scoreboard_image == NULL) {
+        ms->sload.idle = -1;
+        ms->sload.busy = -1;
+        ms->sload.access_count = -1;
+        ms->sload.bytes_served = -1;
+        ms->sload.duration = -1;
+        ms->sload.timestamp = -1;
         ms->interval = -1;
         ms->acc_per_sec = -1;
         ms->bytes_per_sec = -1;
         ms->bytes_per_acc = -1;
         ms->ms_per_acc = -1;
         ms->average_concurrency = -1;
-        if (sl) {
-            sl->idle = -1;
-            sl->busy = -1;
-            sl->access_count = -1;
-            sl->bytes_served = -1;
-            sl->duration = -1;
-            sl->timestamp = -1;
-        }
         return;
     }
 
     index = apr_atomic_read32(&ap_scoreboard_image->global->snap_index);
     if (index == 0) {
-        sload = &ap_scoreboard_image->global->sload0;
         snap = &ap_scoreboard_image->global->snap0;
     } else {
-        sload = &ap_scoreboard_image->global->sload1;
         snap = &ap_scoreboard_image->global->snap1;
     }
-    /* This will overwrite our ms-sload pointer, need to reconstruct */
     memcpy(ms, snap, sizeof(*snap));
-    if (sl) {
-        memcpy(sl, sload, sizeof(*sload));
-    }
 }
 
 void ap_scoreboard_child_init(apr_pool_t *p, server_rec *s)

ylavic avatar Apr 28 '20 10:04 ylavic

The code started like that. The problem is: we can then no longer extend ap_sload_t and stay compatible. ap_sload-t and ap_mon_snap_t are both public. Our rule tells us, that changes should be appended to the end of the struct. If we embed an ap_sload_t in ap_mon_snap_t, then anything added to that struct will show up somewhere in the middle of ap_mon_snap_t and the offsets generated by older compilations will be wrong. And if we move the ap_sload_t inside ap_mon_snap_t to the end right from the beginning, we have the same problem with adding something directly to ap_mon_snap_t.

Separating the structs was the only solution I found that enables us to extend them later. The other solution would be to get rid of ap_mon_snap_t and pack everything into ap_sload_t. But I found that to be misleading about the use of the struct.

rainerjung avatar Apr 28 '20 10:04 rainerjung

Good point, thanks for the explanation.

ylavic avatar Apr 28 '20 10:04 ylavic