Improve cache size accounting
- Charge byte-based cashes for their CacheItems
- Note: we could also improve accuracy here by charging for alignment overhead, but I'd expect that to be a fairly small (and mostly constant) overhead compared to the others here.
- Charge the composite cache for the BitmapRef objects it owns
- Charge the bitmap cache for its OutlineHashValue
- The outline cache is also charge for this, but only per-glyph (not per-byte), and the bitmap cache can easily cause a large number of large outlines to be kept in memory far longer than the outline cache alone would.
The last commit is the most important here. On a sample with extensive use of the font "DINk", I see total memory allocated-and-persistent from ass_render_frame drop from 310MB to 180MB. This also results in the memory usage stabilizing at that value early in playback, rather than slowly growing over the duration.
I've tried that approach with combined LRU list and global cache limit, should follow limits more accurately: https://github.com/MrSmile/libass/commits/merge-cache-limits What do you think about it?
Hmmmmm, I like the concept a lot, but I'm not sure if it actually solves the problem of cached bitmaps potentially causing outline references to stick around without being counted towards the total cache size, even after the outlines themselves have been dropped? Maybe the answer is to count items towards the size until their refcount hits zero, even after they've been deref'd out of the cache. Though, I guess combining the lists should make that case a lot less likely, since the bitmap and outline should always be close to each other in the list…
Re: the change itself, my biggest concern here is how it'll interact with threading; I suppose we could either have the mutex live in the CacheList (and use that instead of the per-cache one when present), or have the list have a separate mutex that's used in addition to the cache's. That or we come up with some way to handle the lists lock-free (which I think might theoretically be possible? Not sure, though.)
I'm not sure if it actually solves the problem of cached bitmaps potentially causing outline references to stick around without being counted towards the total cache size
In that branch I've made outline cache byte-based and moved it into global LRU list. So it correctly contributes to total_size and cannot overflow cache maximal size. Bitmaps keep their outlines, but so combined bitmaps keep their individual counterparts, it shouldn't be a problem so long as total_size is calculating correctly.
my biggest concern here is how it'll interact with threading
I think the simplest solution would be to have a separate mutex for the list. They're mostly separate, so that looks like the most direct solution:
-
ass_cache_create(): cache mutex locked beforeconstruct_func(), list mutex—after,item->sizecan be made atomic; -
ass_cache_dec_ref(): locks cache mutex,list->total_sizecan be made atomic; -
ass_cache_cut(): locks list mutex globally, cache mutex per deleted item insideass_cache_dec_ref().
Lock-free lists are certainly possible, but it's quite hard to do it correctly and often it's not worth it.