cachecontrol
cachecontrol copied to clipboard
UnicodeDecodeError on some utf8 content in headers in cachecontrol
Hi again, I found another similar case like #84 but not exactly the same place.
URL that is failing: http://wizard2.sbs.co.kr/w3/podcast/V0000372136.xml
Specifically, the decoding chokes on the unicode chars in this header:
... 'p3p': "CP='\xb0\xa3\xb7\xab\xb9\xe6\xc4\xa7\xb1\xe2\xc8\xa3'" ...
Stacktrace:
File "/home/ubuntu/.virtualenvs/webserver/local/lib/python2.7/site-packages/requests/sessions.py", line 476, in get
return self.request('GET', url, **kwargs)
File "/home/ubuntu/.virtualenvs/webserver/local/lib/python2.7/site-packages/opbeat/instrumentation/packages/base.py", line 63, in __call__
args, kwargs)
File "/home/ubuntu/.virtualenvs/webserver/local/lib/python2.7/site-packages/opbeat/instrumentation/packages/base.py", line 214, in call_if_samp
ling
return wrapped(*args, **kwargs)
File "/home/ubuntu/.virtualenvs/webserver/local/lib/python2.7/site-packages/requests/sessions.py", line 464, in request
resp = self.send(prep, **send_kwargs)
File "/home/ubuntu/.virtualenvs/webserver/local/lib/python2.7/site-packages/requests/sessions.py", line 576, in send
r = adapter.send(request, **kwargs)
File "/home/ubuntu/.virtualenvs/webserver/src/cachecontrol-master/cachecontrol/adapter.py", line 36, in send
cached_response = self.controller.cached_request(request)
File "/home/ubuntu/.virtualenvs/webserver/src/cachecontrol-master/cachecontrol/controller.py", line 102, in cached_request
resp = self.serializer.loads(request, self.cache.get(cache_url))
File "/home/ubuntu/.virtualenvs/webserver/src/cachecontrol-master/cachecontrol/serialize.py", line 114, in loads
return getattr(self, "_loads_v{0}".format(ver))(request, data)
File "/home/ubuntu/.virtualenvs/webserver/src/cachecontrol-master/cachecontrol/serialize.py", line 180, in _loads_v2
for k, v in cached["response"]["headers"].items()
File "/home/ubuntu/.virtualenvs/webserver/src/cachecontrol-master/cachecontrol/serialize.py", line 180, in <genexpr>
for k, v in cached["response"]["headers"].items()
File "/home/ubuntu/.virtualenvs/webserver/src/cachecontrol-master/cachecontrol/serialize.py", line 30, in _b64_decode_str
return _b64_decode_bytes(s).decode("utf8")
File "/home/ubuntu/.virtualenvs/webserver/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 4: invalid start byte