KopiLua icon indicating copy to clipboard operation
KopiLua copied to clipboard

Encoding issues (known)

Open Dia opened this issue 2 years ago • 0 comments

There are some problems with the encoding, and people should use non UTF-8 files for it to work correctly. I got the issue, that I am parsing lua binaries (.lub) files which I don't create so I can't just save them in another encoding. The problem is not only for UTF-8 but also for any other encodings but default like EUC-KR (cp949), Shift-JIS (cp932) or chinese (cp 936 and 950).

The problem is the ToString() function of the CharPtr.

public override string ToString()
{
	string result = "";
	for (int i = index; (i < chars.Length) && (chars[i] != '\0'); i++)
		result += chars[i];
	return result;
}

The line adding the character to the result string will break the encoding becuase char will only use characters in the users default windows encoding. This removed a lot of characters by ? in my case. I instead casted them to byte instead which keeps all characters.

	// determine the size
	int i;
	for (i = index; (i < chars.Length) && (chars[i] != '\0'); i++)
	{ }

	// copy the data from the char array to the byte array
	byte[] result = new byte[i];
	for (int x = index; (x < i); x++)
	{
		result[x] = (byte)chars[x];
	}

	// return the encoded string
	return Encoding.GetEncoding(1252).GetString(result);

This is not the best code to do that and I am not sure if GetEncoding(1252) is the correct one here. This returns the original string to the caller which can then be translated to a specific encoding:

»ç°ú => 사과

It seems like the default one encoding is 1251 which does not contain 0x82-0x9F + 0xAD (at least some of them are missing) and they get replaced with ? (0x3F) instead. Also converting the 1252 string to UTF-8 works then.

Dia avatar Aug 08 '23 16:08 Dia