json2csv icon indicating copy to clipboard operation
json2csv copied to clipboard

Unicode Error

Open sjb554 opened this issue 9 years ago • 5 comments

I am only getting this error once in a while, but it looks like this: UnicodeDecodeError: 'utf8' codec can't decode byte 0xcd in position 7: invalid continuation byte

Can this be solved by changing the requirements.txt file? Or, is some other solution appropriate here?

Thanks, SJB

sjb554 avatar Nov 14 '16 17:11 sjb554

It sounds like the file in question might not be UTF8. You say, once in a while, are the sources different? When a file is encoded improperly many text editors can detect the encoding and open them regardless. Some, like TextMate allow you to 'save as' to UTF8

On Mon, Nov 14, 2016 at 9:40 AM -0800, "sjb554" [email protected] wrote:

I am only getting this error once in a while, but it looks like this:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xcd in position 7: invalid continuation byte

Can this be solved by changing the requirements.txt file? Or, is some other solution appropriate here?

Thanks,

SJB

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

evidens avatar Nov 16 '16 16:11 evidens

I discovered it was only letters such as 'Í' and 'Ó'. My files were large, but none so large that I couldn't manually go in and replace them (largest being a little over 1 GB).

I am thinking that if it comes up more frequently, I would need to make some kind of scrubbing program to change 'Ó' to 'O' and so on.

Thanks for all the great answers, SJB

sjb554 avatar Nov 17 '16 07:11 sjb554

I would see how you're saving your files. If they're properly encoded in UTF8, it should support extended character sets (I'm pretty sure I've tested it with French input in the past))

On Wed, Nov 16, 2016 at 11:17 PM -0800, "sjb554" [email protected] wrote:

I discovered it was only letters such as 'Í' and 'Ó'. My files were large, but none so large that I couldn't manually go in and replace them (largest being a little over 1 GB).

I am thinking that if it comes up more frequently, I would need to make some kind of scrubbing program to change 'Ó' to 'O' and so on.

Thanks for all the great answers,

SJB

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

evidens avatar Nov 17 '16 15:11 evidens

That makes sense. My download and save code is not very robust:

` def save_json(url): import os filename = url.replace('/','').replace(':','') .replace('.','|').replace('|json','.json').replace('|JSON','.json').replace('Json','.json').replace('|','').replace('?','').replace('=','').replace('&','').replace('_','').replace('-','') path = "C:/xxx/json" fullpath = os.path.join(path, filename) import urllib2 response = urllib2.urlopen(url) webContent = response.read() f = open(fullpath, 'w') f.write(webContent) f.close()

f = open('U:/xxx/url_list.txt') p = f.read() url_list = p.split('\n') #here's where \n is the line break delimiter that can be changed for url in url_list: save_json(url) `

sjb554 avatar Nov 17 '16 16:11 sjb554

Use io.open like in this example http://stackoverflow.com/a/14870531 then the files are written out as utf-8.

On Thu, Nov 17, 2016 at 8:06 AM -0800, "sjb554" [email protected] wrote:

That makes sense. My download and save code is not very robust:

`

def save_json(url):

import os

filename = url.replace('/','').replace(':','') .replace('.','|').replace('|json','.json').replace('|JSON','.json').replace('Json','.json').replace('|','').replace('?','').replace('=','').replace('&','').replace('_','').replace('-','')

path = "C:/xxx/json"

fullpath = os.path.join(path, filename)

import urllib2

response = urllib2.urlopen(url)

webContent = response.read()

f = open(fullpath, 'w')

f.write(webContent)

f.close()

f = open('U:/xxx/url_list.txt')

p = f.read()

url_list = p.split(' ') #here's where is the line break delimiter that can be changed

for url in url_list:

save_json(url)

`

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

evidens avatar Nov 17 '16 16:11 evidens