Limiting cache size?
It appears that neither DictCache nor FileCache provide any way to evict entries that aren't being accessed - neither by age nor a max cache size.
It's only possible with Redis by configuring the database expiration externally.
When using the FileCache a workaround might be to set up a cronjob that just deletes all "old" cache files:
find /path/to/requests_cache/ -type f -mtime +14 -delete && find /path/to/requests_cache/ -type d -empty -delete
Note that this does not check if the files are actually expired and also doesn't limit the actual disk usage, so your mileage may vary.
My anticipation is that when someone reaches this sort of problem, that it is time to take better ownership of that cached data and consider using something like an external store. It would be nice to support many different types of caches along with the usage, but as that gets really complicated, I've avoided it in CacheControl proper.
With that said, I would think it could be a good idea to have a separate caches package that includes different implementations and can share common code such as a worker that can examine the cache implementation for stale entries, allowing folks to focus on support for specific storage systems instead.
I did some testing, it seems that python-diskcache can be used as a drop in replacement for FileCache:
import requests
from cachecontrol import CacheControl
from diskcache import FanoutCache
class MyFanoutCache(FanoutCache):
# Workaround until either grantjenks/python-diskcache#77 or #195 is fixed
def __bool__(self):
return True
__nonzero__ = __bool__
cache = MyFanoutCache('./tmp', size_limit=2 ** 30, eviction_policy='least-recently-used')
session = CacheControl(requests.Session(), cache=cache)
Then you could periodically call cache.cull() to get the size back down.
However, it's not possible to remove expired items, because the cache itself is not aware of the expiry date of the response.
I got the FileCache working but decided to try the diskcache FanoutCache because I wanted the cull functionality, but when testing it appears that the FanoutCache is not actually being fully utilized. Files (cache.db) are being created in the appropriate directories but they aren't being populated with data. I went back to the FileCache for now, as it is working fine.
@tedivm You are right, I checked and while the cache.db files are being created they never store any data. It turns out that the cache object from diskcache implements a __len__ method which returns 0 if there are no cache entries. CacheControl checks the incoming cache argument and falls back to DictCache if it's falsy.
I've created pull requests for both projects to correct this. (https://github.com/grantjenks/python-diskcache/pull/77, https://github.com/ionrock/cachecontrol/pull/195)
In the mean time, if you're still interested to try out the FanoutCache you could subclass it and patch it's boolean conversion behavior.
i.e.:
class MyFanoutCache(FanoutCache):
def __bool__(self): # Python 3
return True
__nonzero__ = __bool__ # Python 2