Add a facility for unlocked director iteration
Pondering #4140 made me realize that our central per-vcl director list constitutes a classic case of iterating a mutable list. That besides the already known fact that running potentially expensive iterator functions while holding the vcl_mtx is a very bad idea.
So this patch is a suggestion on how to get out of this situation. It does not go all the way (it still leaves unsolved a similar problem of iterating over all backends of all vcls), but shows an attempt on how to tackle the "iterate over one VCL's backends".
We add a fittingly named 'vdire' facility which manages the director list and, in particular, director retirement. The main change is that, while iterators are active on the director list, vcl_mtx is not held and any request of a director to retire only ends up in a resignation, which manifests by this director being put on a second list.
When all iterators have completed, the resignation list is worked and the actual retirement carried out.
NOTE: I am particularly not happy about the vdire_(start|end)_iter() interface being exposed. What I would like to see is a FOREACH macro which does both the start and end. Having to use an iterator with a callback is clumsy, and having to directly call start/end with a VTAILQ_FOREACH feels like crossing interfaces. See https://github.com/varnishcache/varnish-cache/pull/4142#issuecomment-2262299121
After rebasing on the newly understood 4e2c50da9b4b66a39191dcc7ea11bc4e3d11ab6c, I have successfully run my stress test on this patch without any issues.
What I would like to see is a FOREACH macro which does both the start and end
Here's a macro which does that:
/*
* this macro might look a bit daunting, but it is really just two things:
*
* - the outer for loop is only to declare the local "next" variable
* and execute the inner for loop once
*
* - the inner for loop is VTAILQ_FOREACH_SAFE with calls to
* vdire_{start,end}_iter() added at the right places:
* start is called at initialization and end when var becomes NULL
* and the loop ends
*
* problem: none of break, goto or return must be used from the loop body
*/
#define _VDIRE_FOREACH(var, vdire, next) \
for (struct vcldir *next = (void *)&next; \
next == (void *)&next; \
next = NULL) \
for (vdire_start_iter(vdire), \
(var) = VTAILQ_FIRST(&(vdire)->directors); \
((var) || (vdire_end_iter(vdire), 0)) && \
((next) = VTAILQ_NEXT(var, directors_list), 1); \
(var) = (next))
#define VDIRE_FOREACH(var, vdire) \
_VDIRE_FOREACH(var, vdire, VUNIQ_NAME(_vdirenext))
As stated in the comment, none of break, goto or return must be used from within the loop body. So I think I prefer the explicit vdire_{start,end}_iter()
Putting this out of draft status despite not having worked on it any more, because it has received no feedback and I think it's actually important - now having seen one production case where this is at least a potential candidate.
bugwash: good idea by phk: New iterators encountering retired backends should wait until they are cleared, otherwise a constant stream of iterators could prevent retirement. I do not think this is a realistic scenario at this point, but we might as well do this right from the start.
another feedback question is if we can simplify this patch with an r/w lock. I am sure that I did look at this for quite some time and came to the conclusion that it does not do much to help us, but I will revisit this question.
New iterators encountering retired backends should wait until they are cleared
done: 41c42a05cd364003f8ad7f4ef78f53c5ea9a35d2
another feedback question is if we can simplify this patch with an r/w lock.
I do not see the simplification. With an rwlock, vdire_resign() would try to wrlock and, if this fails, would need to start a thread which would then wrlock and do the remove.
I'm OK with this, as long as this iterator API is only for use by the CLI thread, and this should be documented in the code with an ASSERT_CLI() because allowing multiple thread to independently and concurrently iterate at the same time will require further analysis. (For instance, should we allow two different iterators to have their func called on the same director at the same time ?)
We should (probably) hand the resigned_list to a worker thread, rather than bog down the CLI thread with potentially many cleanup calls.
I think the iterator should skip directors which have already been retired, no point in the CLI processing them if they are already (potentially long since) gone.
Apart from that: OK with me.
The next force-push has these changes (besides the rebase and typo fix):
skip resigned backends
we can identify these by a zero refcount. It is is already illegal to assign a director with a zero backend.
diff --git a/bin/varnishd/cache/cache_vcl.c b/bin/varnishd/cache/cache_vcl.c
index 12ed136a7..918537e50 100644
--- a/bin/varnishd/cache/cache_vcl.c
+++ b/bin/varnishd/cache/cache_vcl.c
@@ -469,6 +469,8 @@ vdire_iter(struct cli *cli, const char *pat, const struct vcl *vcl,
vdire_start_iter(vdire);
VTAILQ_FOREACH(vdir, &vdire->directors, directors_list) {
+ if (vdir->refcnt == 0)
+ continue;
if (pat != NULL && fnmatch(pat, vdir->cli_name, 0))
continue;
found++;
this is done only for vdire_iter() and not vcl_BackendEvent(), because vcldir_retire() relies on prior temperature events being delivered.
ASSERT_CLI()
... is added. I expect this to be revisited at some point.
Deferred call to vcldir_retire():
I have actually written the code and reflected on it: I am not going to commit this change because it will open another can of racing worms. In particular, when we cool a VCL and unload a vmod, we need to be sure that all objects using code from that vmod are gone. Delaying the actual destruction adds further complications, and the goal here is to not introduce additional complexity, but stabilize the code by handling the complications already introduced with reference counting.
On the contrary, pondering this question, I have added this change out of prudence to ensure that before the final iteration over the directors list, we have finished all retirements:
diff --git a/bin/varnishd/cache/cache_vcl.c b/bin/varnishd/cache/cache_vcl.c
index aca2b5cab..2140bc664 100644
--- a/bin/varnishd/cache/cache_vcl.c
+++ b/bin/varnishd/cache/cache_vcl.c
@@ -585,6 +585,14 @@ vcl_KillBackends(const struct vcl *vcl)
vdire = vcl->vdire;
CHECK_OBJ_NOTNULL(vdire, VDIRE_MAGIC);
+ /*
+ * ensure all retirement has finished (vdire_start_iter waits for it)
+ */
+ vdire_start_iter(vdire);
+ vdire_end_iter(vdire);
+ AZ(vdire->iterating);
+ assert(VTAILQ_EMPTY(&vdire->resigning));
+
/*
* Unlocked and sidelining vdire because no further directors can be added, and the
* remaining ones need to be able to remove themselves.
Not sure how much attention it got, but after this was merged we had a complaint from Coverity Scan:
*** CID 1641631: (USE_AFTER_FREE)
/bin/varnishd/cache/cache_vcl.c: 435 in vdire_end_iter()
429 VTAILQ_REMOVE(&vdire->directors, vdir, directors_list);
430 temp = *vdire->tempp;
431 PTOK(pthread_cond_broadcast(&vdire->cond));
432 }
433 Lck_Unlock(vdire->mtx);
434
>>> CID 1641631: (USE_AFTER_FREE)
>>> Dereferencing freed pointer "vdir".
435 VTAILQ_FOREACH(vdir, &resigning, resigning_list)
436 vcldir_retire(vdir, temp);
437 }
438
439 // if there are no iterators, remove from directors and retire
440 // otherwise put on resigning list to work when iterators end
Not sure how much attention it got, but after this was merged we had a complaint from Coverity Scan:
*** CID 1641631: (USE_AFTER_FREE) /bin/varnishd/cache/cache_vcl.c: 435 in vdire_end_iter() 429 VTAILQ_REMOVE(&vdire->directors, vdir, directors_list); 430 temp = *vdire->tempp; 431 PTOK(pthread_cond_broadcast(&vdire->cond)); 432 } 433 Lck_Unlock(vdire->mtx); 434 >>> CID 1641631: (USE_AFTER_FREE) >>> Dereferencing freed pointer "vdir". 435 VTAILQ_FOREACH(vdir, &resigning, resigning_list) 436 vcldir_retire(vdir, temp); 437 } 438 439 // if there are no iterators, remove from directors and retire 440 // otherwise put on resigning list to work when iterators end
Probably needs to use VTAILQ_FOREACH_SAFE instead ?
-> c63bd508841d3781dc99dc3109b331fffc240021 thank you