cachecontrol icon indicating copy to clipboard operation
cachecontrol copied to clipboard

Limiting cache size?

Open OrangeDog opened this issue 7 years ago • 5 comments

It appears that neither DictCache nor FileCache provide any way to evict entries that aren't being accessed - neither by age nor a max cache size.

It's only possible with Redis by configuring the database expiration externally.

OrangeDog avatar Apr 12 '18 14:04 OrangeDog

When using the FileCache a workaround might be to set up a cronjob that just deletes all "old" cache files:

find /path/to/requests_cache/ -type f -mtime +14 -delete && find /path/to/requests_cache/ -type d -empty -delete

Note that this does not check if the files are actually expired and also doesn't limit the actual disk usage, so your mileage may vary.

jaap3 avatar May 22 '18 09:05 jaap3

My anticipation is that when someone reaches this sort of problem, that it is time to take better ownership of that cached data and consider using something like an external store. It would be nice to support many different types of caches along with the usage, but as that gets really complicated, I've avoided it in CacheControl proper.

With that said, I would think it could be a good idea to have a separate caches package that includes different implementations and can share common code such as a worker that can examine the cache implementation for stale entries, allowing folks to focus on support for specific storage systems instead.

ionrock avatar May 22 '18 14:05 ionrock

I did some testing, it seems that python-diskcache can be used as a drop in replacement for FileCache:

import requests

from cachecontrol import CacheControl
from diskcache import FanoutCache

class MyFanoutCache(FanoutCache):
    # Workaround until either grantjenks/python-diskcache#77 or #195 is fixed
    def __bool__(self): 
        return True
    __nonzero__ = __bool__

cache = MyFanoutCache('./tmp', size_limit=2 ** 30, eviction_policy='least-recently-used')
session = CacheControl(requests.Session(), cache=cache)

Then you could periodically call cache.cull() to get the size back down.

However, it's not possible to remove expired items, because the cache itself is not aware of the expiry date of the response.

jaap3 avatar Jun 19 '18 09:06 jaap3

I got the FileCache working but decided to try the diskcache FanoutCache because I wanted the cull functionality, but when testing it appears that the FanoutCache is not actually being fully utilized. Files (cache.db) are being created in the appropriate directories but they aren't being populated with data. I went back to the FileCache for now, as it is working fine.

tedivm avatar Jul 27 '18 00:07 tedivm

@tedivm You are right, I checked and while the cache.db files are being created they never store any data. It turns out that the cache object from diskcache implements a __len__ method which returns 0 if there are no cache entries. CacheControl checks the incoming cache argument and falls back to DictCache if it's falsy.

I've created pull requests for both projects to correct this. (https://github.com/grantjenks/python-diskcache/pull/77, https://github.com/ionrock/cachecontrol/pull/195)

In the mean time, if you're still interested to try out the FanoutCache you could subclass it and patch it's boolean conversion behavior.

i.e.:

class MyFanoutCache(FanoutCache):
    def __bool__(self):  # Python 3
        return True

    __nonzero__ = __bool__  # Python 2

jaap3 avatar Sep 11 '18 11:09 jaap3