Add CachedViews
Motivation
- Over at trakt-scrobbler, some of my confuse templates are getting more and more computationally expensive. Example: I have a template that needs to parse a list of strings as Regex patterns.
- Repeatedly computing the view value according to the template is not feasible in the hot path, so I have to settle with doing a
view.get(template)in the global scope once when the module loads. - My app is meant to be used as a long running service. Whenever the user changes a config value externally (via a cli), they have to restart the service to have their change registered, because of the limitation above.
- It would be good if the updated value could be reflected immediately from the next access of the view.
Insight
The main pattern to notice is that we are extremely likely to use the same template to parse any given view over time. Once we decide the specific template for a view, we want to keep calling get with the same template over and over again (in case the value has changed).
Method
This calls for the introduction of a caching layer that can store the values for expensive templates. In most cases, the config gets are much more frequent than sets, so it will benefit from such caching.
API
- The PR adds a new subclass of
ConfigView, called theCachedConfigView. - The main addition to it is the
get_handle(self, template)method. Instead of computing the value of the view according to the given template eagerly, it returns aCachedConfigHandleobject that can be used as a proxy for the value. - In the first call to
geton aCachedConfigHandle, the template is applied to the view, and the value is stored internally. All subsequentgets can re-use this cached value. - When the view is updated from elsewhere, all handle objects for that view are automatically invalidated, and the next call to
geton them will cause a re-computation of the value. - Note that we still retain the ability to call
getdirectly on aviewfor cases when the caching doesn't make sense (cheap to compute values).
Example
To view this in action, see the ipc branch of my app.
You can also get an overview of the API from the unit tests.
TODO
This is just a draft PR, tailored to my specific use-case. I am sure I will have missed some edge cases.
- [x] The nomenclature of
invalidateandunsetis misleading, and does not fit the usual meanings - an invalidated cache is usually supposed to be fixable by performing a recomputation, but that isunsethere. Theinvalidatedhere means the underlying view itself has been rendered invalid, probably because some of its parents' values changed without preserving the structure. Need to find better names for these. (Can't tackle both the hard things in Computer Science at once 😮💨) - [x] Need to add more docs/comments
- [ ] Add more test cases
- [x] Fix type hints; make them Py2 compatible maybe?
- [ ] Do we really need a
get_handleon theRootView? - [ ] Probably need to make it thread-safe
I will be happy to clean it up after an initial review from your end.
Related
This PR is kind of the other half of #130. Once this is done, I probably will need to solve the "persist changes to disk" problem too.
Awesome; this is a super cool idea! I’d love to incorporate this.
Here are some initial high-level impressions:
- A very small nit: maybe this new stuff should live in a
confuse.cachemodule instead of directly inconfuse.core. - I see that calling
geton an invalidated handle raises an error. It could be marginally more convenient to instead just automatically refresh the value in the handle when this happens. We still have the view and the template, so hopefully this is just a matter of callingview.get(template)? - This is not exactly an actionable note, but just to record some thoughts… your design got me thinking about whether it would be possible to eliminate the distinction between cached handles and views themselves. The special
CachedViewMixinwould take care of caching itself. That is, you could callview.cache_get(tmpl)or similar, and this would mean that future calls for the sameview.cache_get(tmpl)would return a cached value. The cached source would then just need to invalidate all these views. However, the problem with this would be that the view would need to maintain a template-to-value mapping... which would mean views would need to be equatable, which seems tricky. So the handle “layer” is probably here to stay.
I don’t think we need to worry about py2 support; we should officially drop that if we haven’t already.
Thanks for giving this a look!
maybe this new stuff should live in a
confuse.cachemodule
Will do.
automatically refresh the value in the handle when this happens
There is a slight confusion in the naming as I mentioned in the first TODO. The "invalid" state in the code today means that the underlying view itself has been invalidated - the key doesn't exist due to some structural modification. In this case, it is not possible to get any value from the current handle, no matter the template. We have to raise an exception in this case. See unit tests for some examples. Do note that this is will be a pretty uncommon occurance- the code is here for the sake of completeness.
For cases when the view does remain valid, we keep the value as undefined, and it does get refreshed on the next call to handle.get() by doing a view.get(template).
How about renaming the current undefined to invalid (to match general usage), and the current invalid to something like invalid_view or destroyed_view?
you could call
view.cache_get(tmpl)
I did explore this API initially, but as you noticed, it will require keeping a mapping from templates to values inside a view. The major issue here would be hashing the template on every call to cache_get. Many templates wrap over dicts and lists internally, which would have to be frozen before being hashed. This computation would be undesired for the common case, where the template structure doesn't change at all. The handle abstraction is superior for this reason IMO.
I don’t think we need to worry about py2 support; we should officially drop that if we haven’t already.
I see that the Build CI has failed on the Py2.7 setup because of the new type annotation syntax. Could you please specify what versions of Python would you like to target going forward? (I'd prefer >=3.6 due to the type syntax changes)
Aha, thank you for explaining! I obviously read your original explanation of the "invalid" state too quickly. This makes sense! I suppose I like the idea of renaming the state. Maybe something like "missing" would be most evocative?
(I know this is a stretch, but as someone who thinks about cache coherence protocols in my day job, I tend to think of "invalid" as meaning merely "I don't have the data locally, but I know where to go get it if anybody needs it." Not that this is the normal usage in the real world, of course…)
And good point about the CI and Python 2. Yes, the new minimum should be 3.6 (matching beets). I'll work on updating the docs, metadata, and Actions config…
I realized that ConfigHandleInvalidatedError that I'd cooked up is pretty much the same as a NotFoundError that is raised when the value (really, the view) is missing. So now we raise this error in the same circumstances.
Do we want to support providing a default value for this case as an arg to get_handle or CachedHandle.get()?
That unification of exception types seems like a good idea to me!
If I understand correctly, I think it's probably not necessary to add a second "layer" of defaults… that is, hopefully, you can get all the default behavior you want by using appropriate templates and fallback sources. Maybe put differently, it seems like the "law" should be that config.get_handle(view).get() should always produce the same value (or raise the same exception) as config.get(view); as a consequence, you can set up default fallbacks in the same way for both. Does that philosophizing make any kind of sense, roughly?
That does sound very clean. I have updated the code to call template.get_default_value() when we call handle.get() on a missing view. This will match the existing behaviour of view.get(template) pretty much perfectly (return default val if present in template, otherwise raise a NotFoundError).
Two things-
-
I took another look at the docs relating to "missing" views (the View Theory section). It says-
resolve a view at the same granularity you want config files to override each other
In essence, we will return an older value for a sub-view even when the parent view has been updated to change its structure.
It feels a bit counter-intuitive to me (why am I able to read an overridden value?), but since this is the expected behaviour, it seems like the "missing" case is not needed at all for our cache. This will simplify the invalidation logic considerably :) Will push an update soon.
-
Regarding the creation of non-ephemeral subviews, I do agree that it deviates from the existing behaviour. But as far as I can tell, all code paths in which a
Subviewis created go through the__getitem__method, which we have overriden (even iteration eventually doesself[index]). So I don't think it will pose any danger to return the same subview each time.On the other hand, I do agree that this might trip someone up (especially if the user tries to create a
CachedSubviewmanually). Keeping all the handles at the root does sound like a way around this, but I'm not too happy that a singlesetwould invalidate everything. I think it should be possible to maintain the entire tree of handles with the same structure as the views, kinda like a "shadow config". (Though I'm not sure what happens in the "missing" case)
@sampsyo Do we want to resume this?