cachecontrol icon indicating copy to clipboard operation
cachecontrol copied to clipboard

Setting max-age on the response causes cachecontrol to fail with exception about 'expires' argument

Open svanoort opened this issue 4 years ago • 7 comments

Applies to cachecontrol 0.12.7 and probably 0.12.8

When we have a response with following header: 'Cache-Control': 'public, max-age=1'

I believe this may be related to this PR: https://github.com/ionrock/cachecontrol/pull/233

We get the following exception from cachecontrol:

TypeError: set() got an unexpected keyword argument 'expires'
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/lib/python3.9/site-packages/requests/sessions.py:555: in get
    return self.request('GET', url, **kwargs)
/usr/local/lib/python3.9/site-packages/requests/sessions.py:542: in request
    resp = self.send(prep, **send_kwargs)
/usr/local/lib/python3.9/site-packages/requests/sessions.py:697: in send
    r.content
/usr/local/lib/python3.9/site-packages/requests/models.py:836: in content
    self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
/usr/local/lib/python3.9/site-packages/requests/models.py:758: in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
/usr/local/lib/python3.9/site-packages/urllib3/response.py:576: in stream
    data = self.read(amt=amt, decode_content=decode_content)
/usr/local/lib/python3.9/site-packages/urllib3/response.py:519: in read
    data = self._fp.read(amt) if not fp_closed else b""
/usr/local/lib/python3.9/site-packages/cachecontrol/filewrapper.py:96: in read
    self._close()
/usr/local/lib/python3.9/site-packages/cachecontrol/filewrapper.py:76: in _close
    self.__callback(result)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <cachecontrol.controller.CacheController object at 0x7f30ba331640>
request = <PreparedRequest [GET]>
response = <urllib3.response.HTTPResponse object at 0x7f30ba3179a0>, body = b''
status_codes = None
    def cache_response(self, request, response, body=None, status_codes=None):
        """
        Algorithm for caching requests.
    
        This assumes a requests Response object.
        """
        # From httplib2: Don't cache 206's since we aren't going to
        #                handle byte range requests
        cacheable_status_codes = status_codes or self.cacheable_status_codes
        if response.status not in cacheable_status_codes:
            logger.debug(
                "Status code %s not in %s", response.status, cacheable_status_codes
            )
            return
    
        response_headers = CaseInsensitiveDict(response.headers)
    
        if 'date' in response_headers:
            date = calendar.timegm(
                parsedate_tz(response_headers['date'])
            )
        else:
            date = 0
    
        # If we've been given a body, our response has a Content-Length, that
        # Content-Length is valid then we can check to see if the body we've
        # been given matches the expected size, and if it doesn't we'll just
        # skip trying to cache it.
        if (
            body is not None
            and "content-length" in response_headers
            and response_headers["content-length"].isdigit()
            and int(response_headers["content-length"]) != len(body)
        ):
            return
    
        cc_req = self.parse_cache_control(request.headers)
        cc = self.parse_cache_control(response_headers)
    
        cache_url = self.cache_url(request.url)
        logger.debug('Updating cache with response from "%s"', cache_url)
    
        # Delete it from the cache if we happen to have it stored there
        no_store = False
        if "no-store" in cc:
            no_store = True
            logger.debug('Response header has "no-store"')
        if "no-store" in cc_req:
            no_store = True
            logger.debug('Request header has "no-store"')
        if no_store and self.cache.get(cache_url):
            logger.debug('Purging existing cache entry to honor "no-store"')
            self.cache.delete(cache_url)
        if no_store:
            return
    
        # https://tools.ietf.org/html/rfc7234#section-4.1:
        # A Vary header field-value of "*" always fails to match.
        # Storing such a response leads to a deserialization warning
        # during cache lookup and is not allowed to ever be served,
        # so storing it can be avoided.
        if "*" in response_headers.get("vary", ""):
            logger.debug('Response header has "Vary: *"')
            return
    
        # If we've been given an etag, then keep the response
        if self.cache_etags and "etag" in response_headers:
            expires_time = 0
            if response_headers.get('expires'):
                expires = parsedate_tz(response_headers['expires'])
                if expires is not None:
                    expires_time = calendar.timegm(expires) - date
    
            expires_time = max(expires_time, 14 * 86400)
    
            logger.debug('etag object cached for {0} seconds'.format(expires_time))
            logger.debug("Caching due to etag")
            self.cache.set(
                cache_url,
                self.serializer.dumps(request, response, body),
                expires=expires_time
            )
    
        # Add to the cache any permanent redirects. We do this before looking
        # that the Date headers.
        elif int(response.status) in PERMANENT_REDIRECT_STATUSES:
            logger.debug("Caching permanent redirect")
            self.cache.set(cache_url, self.serializer.dumps(request, response, b''))
    
        # Add to the cache if the response headers demand it. If there
        # is no date header then we can't do anything about expiring
        # the cache.
        elif "date" in response_headers:
            date = calendar.timegm(
                parsedate_tz(response_headers['date'])
            )
            # cache when there is a max-age > 0
            if "max-age" in cc and cc["max-age"] > 0:
                logger.debug("Caching b/c date exists and max-age > 0")
                expires_time = cc['max-age']
>               self.cache.set(
                    cache_url,
                    self.serializer.dumps(request, response, body),
                    expires=expires_time
                )
E               TypeError: set() got an unexpected keyword argument 'expires'
/usr/local/lib/python3.9/site-packages/cachecontrol/controller.py:354: TypeError

svanoort avatar Oct 29 '21 21:10 svanoort

CC @ionrock I suspect this one is straightforward, but am not sure if there's any other supporting information needed here? (other library versions etc)

svanoort avatar Oct 29 '21 21:10 svanoort

@svanoort I'm not able to reproduce this in a test with the headers you mentioned. Do you mind sending the actual response headers? Also, are you using any heuristics or a custom cache implementation?

ionrock avatar Oct 31 '21 03:10 ionrock

@svanoort what backend are you using?

Using diskcache's FanoutCache here and had to shim it:

    class FanoutCache(_FanoutCache):
        def set(self, key, value, **kwargs):
            kwargs['expire'] = kwargs.pop('expires', None)
            return super().set(key, value, **kwargs)

Or can just pop and drop it to not expire in the backend

ziddey avatar Oct 31 '21 21:10 ziddey

I'm testing new release with pip and had the same issue, I had to add a expires keyword to pip's custom cache:

https://github.com/itamarst/pip/commit/99a0bcf542c07fa6ccc3d2e198f4d593fec7f1b6

itamarst avatar Nov 03 '21 13:11 itamarst

I'm having the same issue with PR #233

This PR breaks those of us who use fasteners instead of the deprecated and not updated since 2015 lockfile module. Both fasteners and oslo.concurrency are recommended replacements:

Note: This package is deprecated. It is highly preferred that instead of using this code base that instead fasteners or oslo.concurrency is used instead.

However, neither of those accepts an expires= argument on their .lock() method interface.

See also https://github.com/ionrock/cachecontrol/issues/109 which I appear to have opened a few years ago. 😁

An alternative referenced there is https://github.com/tox-dev/py-filelock.git - which also does not support the concept of expiry.

dsully avatar Nov 08 '21 21:11 dsully

i don't know when this was introduced, but this also causes a regression in feed2exec: https://gitlab.com/anarcat/feed2exec/-/issues/22

basically, the set parameter to the cache backend now should expect an expires parameter, which is new. it's not mentioned in the release notes, for what it's worth, and it doesn't seem like a minor version bump signaled the API change either... 0.12.6 worked fine and 0.12.10 (or before?) broke it.

i patched it with: https://gitlab.com/anarcat/feed2exec/-/commit/c5563d093cfd6e86123438cacee9358ac491cbc7.patch

but it would be nice to have a little heads up for things like this. :)

anarcat avatar Dec 10 '21 20:12 anarcat

oh, and thanks for implementing this feature, quite nice!

anarcat avatar Dec 10 '21 20:12 anarcat