Simplified Chinese

Open dan2468 opened this issue 8 years ago • 1 comments

Should it be able to fix “ÓůÓĂČíĽţ(YYRJ)” ? (It should be a person’s name in a Chinese script.)

Jan 22 '18 08:01 dan2468

The text is "御用软件(YYRJ)", right? (That's the result of encoding the text as Windows-1250 and decoding as GBK.)

This is a similar case to #4, but because GBK is a multi-byte character set, it is at least conceivable that the ftfy library could deal with it.

The problem is the decoding as Windows-1250, the Eastern European encoding that's giving you letters like ů. It often creates a mess of ambiguity (as it does in #4) by being too similar to ISO-8859-2. I don't think ftfy will ever be able to disentangle Windows-1250 from arbitrary other encodings for that reason. Do you have any control over your data source that's decoding text from numerous different languages as if it were Windows-1250?

Jan 23 '18 17:01 rspeer