cachecontrol icon indicating copy to clipboard operation
cachecontrol copied to clipboard

UnicodeDecodeError on some utf8 content in headers in cachecontrol

Open hakanw opened this issue 10 years ago • 0 comments

Hi again, I found another similar case like #84 but not exactly the same place.

URL that is failing: http://wizard2.sbs.co.kr/w3/podcast/V0000372136.xml

Specifically, the decoding chokes on the unicode chars in this header:

... 'p3p': "CP='\xb0\xa3\xb7\xab\xb9\xe6\xc4\xa7\xb1\xe2\xc8\xa3'" ...

Stacktrace:

  File "/home/ubuntu/.virtualenvs/webserver/local/lib/python2.7/site-packages/requests/sessions.py", line 476, in get
    return self.request('GET', url, **kwargs)
  File "/home/ubuntu/.virtualenvs/webserver/local/lib/python2.7/site-packages/opbeat/instrumentation/packages/base.py", line 63, in __call__
    args, kwargs)
  File "/home/ubuntu/.virtualenvs/webserver/local/lib/python2.7/site-packages/opbeat/instrumentation/packages/base.py", line 214, in call_if_samp
ling
    return wrapped(*args, **kwargs)
  File "/home/ubuntu/.virtualenvs/webserver/local/lib/python2.7/site-packages/requests/sessions.py", line 464, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/ubuntu/.virtualenvs/webserver/local/lib/python2.7/site-packages/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/home/ubuntu/.virtualenvs/webserver/src/cachecontrol-master/cachecontrol/adapter.py", line 36, in send
    cached_response = self.controller.cached_request(request)
  File "/home/ubuntu/.virtualenvs/webserver/src/cachecontrol-master/cachecontrol/controller.py", line 102, in cached_request
    resp = self.serializer.loads(request, self.cache.get(cache_url))
  File "/home/ubuntu/.virtualenvs/webserver/src/cachecontrol-master/cachecontrol/serialize.py", line 114, in loads
    return getattr(self, "_loads_v{0}".format(ver))(request, data)
  File "/home/ubuntu/.virtualenvs/webserver/src/cachecontrol-master/cachecontrol/serialize.py", line 180, in _loads_v2
    for k, v in cached["response"]["headers"].items()
  File "/home/ubuntu/.virtualenvs/webserver/src/cachecontrol-master/cachecontrol/serialize.py", line 180, in <genexpr>
    for k, v in cached["response"]["headers"].items()
  File "/home/ubuntu/.virtualenvs/webserver/src/cachecontrol-master/cachecontrol/serialize.py", line 30, in _b64_decode_str
    return _b64_decode_bytes(s).decode("utf8")
  File "/home/ubuntu/.virtualenvs/webserver/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 4: invalid start byte

hakanw avatar Aug 05 '15 15:08 hakanw