UnicodeDecodeError

Open sspina opened this issue 7 years ago • 1 comments

Hello,

when I try to parse a corpus, I get the following error message: UnicodeDecodeError: 'ascii' codec can't decode byte 0xcc in position 33: ordinal not in range(128) I attach the log file

Thank you for your help,

Stefania

log-02.txt

May 04 '18 20:05 sspina

Hey,

Sorry, since corpkit I’ve more or less moved onto other projects, and don’t know if I’ll have time to make any needed fix.

The parsing seems to be caused by character encodings in the text. Meaning, there are probably non-standard characters in there, like umlauts or something.

https://stackoverflow.com/questions/21129020/how-to-fix-unicodedecodeerror-ascii-codec-cant-decode-byte https://stackoverflow.com/questions/21129020/how-to-fix-unicodedecodeerror-ascii-codec-cant-decode-byte

see here for more information.

If I do manage to get back to this project I’ll bear this in mind.

On 4 May 2018, at 10:36 pm, sspina [email protected] wrote:

Hello,

when I try to parse a corpus, I get the following error message: UnicodeDecodeError: 'ascii' codec can't decode byte 0xcc in position 33: ordinal not in range(128) I attach the log file

Thank you for your help,

Stefania

log-02.txt https://github.com/interrogator/corpkit/files/1975812/log-02.txt — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/interrogator/corpkit/issues/49, or mute the thread https://github.com/notifications/unsubscribe-auth/AJ_G3B-lVRLgIA0s1oyd9_2fW6aeY_i9ks5tvLvJgaJpZM4TzPL_.

May 05 '18 09:05 interrogator