Unicode Error
I am only getting this error once in a while, but it looks like this: UnicodeDecodeError: 'utf8' codec can't decode byte 0xcd in position 7: invalid continuation byte
Can this be solved by changing the requirements.txt file? Or, is some other solution appropriate here?
Thanks, SJB
It sounds like the file in question might not be UTF8. You say, once in a while, are the sources different? When a file is encoded improperly many text editors can detect the encoding and open them regardless. Some, like TextMate allow you to 'save as' to UTF8
On Mon, Nov 14, 2016 at 9:40 AM -0800, "sjb554" [email protected] wrote:
I am only getting this error once in a while, but it looks like this:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcd in position 7: invalid continuation byte
Can this be solved by changing the requirements.txt file? Or, is some other solution appropriate here?
Thanks,
SJB
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
I discovered it was only letters such as 'Í' and 'Ó'. My files were large, but none so large that I couldn't manually go in and replace them (largest being a little over 1 GB).
I am thinking that if it comes up more frequently, I would need to make some kind of scrubbing program to change 'Ó' to 'O' and so on.
Thanks for all the great answers, SJB
I would see how you're saving your files. If they're properly encoded in UTF8, it should support extended character sets (I'm pretty sure I've tested it with French input in the past))
On Wed, Nov 16, 2016 at 11:17 PM -0800, "sjb554" [email protected] wrote:
I discovered it was only letters such as 'Í' and 'Ó'. My files were large, but none so large that I couldn't manually go in and replace them (largest being a little over 1 GB).
I am thinking that if it comes up more frequently, I would need to make some kind of scrubbing program to change 'Ó' to 'O' and so on.
Thanks for all the great answers,
SJB
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
That makes sense. My download and save code is not very robust:
` def save_json(url): import os filename = url.replace('/','').replace(':','') .replace('.','|').replace('|json','.json').replace('|JSON','.json').replace('Json','.json').replace('|','').replace('?','').replace('=','').replace('&','').replace('_','').replace('-','') path = "C:/xxx/json" fullpath = os.path.join(path, filename) import urllib2 response = urllib2.urlopen(url) webContent = response.read() f = open(fullpath, 'w') f.write(webContent) f.close()
f = open('U:/xxx/url_list.txt') p = f.read() url_list = p.split('\n') #here's where \n is the line break delimiter that can be changed for url in url_list: save_json(url) `
Use io.open like in this example http://stackoverflow.com/a/14870531 then the files are written out as utf-8.
On Thu, Nov 17, 2016 at 8:06 AM -0800, "sjb554" [email protected] wrote:
That makes sense. My download and save code is not very robust:
`
def save_json(url):
import os
filename = url.replace('/','').replace(':','') .replace('.','|').replace('|json','.json').replace('|JSON','.json').replace('Json','.json').replace('|','').replace('?','').replace('=','').replace('&','').replace('_','').replace('-','')
path = "C:/xxx/json"
fullpath = os.path.join(path, filename)
import urllib2
response = urllib2.urlopen(url)
webContent = response.read()
f = open(fullpath, 'w')
f.write(webContent)
f.close()
f = open('U:/xxx/url_list.txt')
p = f.read()
url_list = p.split(' ') #here's where is the line break delimiter that can be changed
for url in url_list:
save_json(url)
`
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.