Lua.NET icon indicating copy to clipboard operation
Lua.NET copied to clipboard

Does it support utf8?

Open Ailtop opened this issue 1 year ago • 12 comments

Load utf8 string, display incorrectly

Ailtop avatar Aug 14 '24 10:08 Ailtop

Could you provide more information? A reproducible bit of code at least would be helpful to understand what you are trying to do.

tilkinsc avatar Aug 15 '24 03:08 tilkinsc

var L = luaL_newstate(); luaL_openlibs(L); { // luaL_dostring(L, "config = {'您好'}"); // Direct dostring is correct luaL_dofile(L, "D:\main.lua"); // but dofile is not correct, file encoding is UTF-8, file content is "config = {'您好'}" lua_getglobal(L, "config"); lua_pushinteger(L, 1); lua_gettable(L, -2); Console.WriteLine(lua_tostring(L, -1)); } lua_close(L);

Ailtop avatar Aug 15 '24 04:08 Ailtop

This is weird behavior because that directly calls a native function within the dll library. What version of Lua were you using?

tilkinsc avatar Aug 15 '24 05:08 tilkinsc

Lua53

Ailtop avatar Aug 15 '24 05:08 Ailtop

luaL_dostring(L, File.ReadAllText("D:\main.lua")); // this is also correct It looks like the dofile is incorrect.

Ailtop avatar Aug 15 '24 05:08 Ailtop

Seems like in lua53 luaL_dofile will call luaL_loadfile which calls luaL_loadfilex (just like lua) with mode null which calls lua_load within the dll library. Yada yada if it fails it leaves an error on the stack. Could you tostring the error, perhaps that will give us more information. https://www.lua.org/manual/5.3/manual.html#lua_load

Also more information about mode, The string mode controls whether the chunk can be text or binary (that is, a precompiled chunk). It may be the string "b" (only binary chunks), "t" (only text chunks), or "bt" (both binary and text). The default is "bt".

tilkinsc avatar Aug 15 '24 05:08 tilkinsc

There is no error in luaL_dofile. I've given you all the code to try out.

Ailtop avatar Aug 15 '24 07:08 Ailtop

I got around to trying it out, and have confirmed the issue. Investigating.

tilkinsc avatar Aug 28 '24 07:08 tilkinsc

image

..................

Can't fix without modifying source...... (which means rolling custom code lib-side D:)

This confirms the issue with lua_loadstring

var L = luaL_newstate();
luaL_openlibs(L);
{
	byte[] bytes = File.ReadAllBytes("./main.lua");
	string str = Encoding.UTF8.GetString(bytes);
	luaL_loadbufferx(L, str, (ulong)str.Length, "chunk", "bt");
	for (int i=0; i<bytes.Length; i++)
	{
		Console.Write($"{bytes[i]:x} ");
	}
	lua_call(L, 0, 0);
	lua_getglobal(L, "config");
	lua_pushinteger(L, 1);
	lua_gettable(L, -2);
	string? str2 = lua_tostring(L, -1);
	Console.WriteLine(str2);
}
lua_close(L);

As for luaL_dofile, there's something emitting incorrect data lua-side. Lua is capable of UTF-8 and UTF-8 BOM, but apparently this code path isn't. The safest way to load code is using {luaL_loadbuffer,,x} directly. I confirmed this is a Lua-owned bug by using my REPL app capable of ordered file/string execution and the fault transpires there as well in the same exact way.

I am open to anyone's contributions for the solution!

Temporary Patch:

// does not handle stdin
// does not handle shebangs maybe i.e. #!/bin/lua
// not sure if it handles compiled lua?
public static int _luaL_dofile(lua_State L, string fn)
{
	string script = File.ReadAllText(fn);
	int status = luaL_loadbufferx(L, script, (ulong) script.Length, $"@{fn}", "bt");
	if (status > 0)
		return status;
	return lua_pcall(L, 0, LUA_MULTRET, 0);
}

tilkinsc avatar Aug 28 '24 09:08 tilkinsc

In c to C#, 'const char *buff' should correspond to 'byte[] buff', not 'string buff', using string will result in the wrong size being passed.

public static int _luaL_dofile(lua_State L, string fn)
{
	string script = File.ReadAllText(fn);
	int status = luaL_loadbufferx(L, script, (ulong) script.Length, $"@{fn}", "bt"); // script.Length is incorrect, when I change it to byte[] it's correct.
	if (status > 0)
		return status;
	return lua_pcall(L, 0, LUA_MULTRET, 0);
}
public static extern int luaL_loadbufferx(lua_State L, string buff, size_t sz, string name, string? mode);

LUALIB_API int luaL_loadbufferx (lua_State *L, const char *buff, size_t size,
                                 const char *name, const char *mode) {
  LoadS ls;
  ls.s = buff;
  ls.size = size;
  return lua_load(L, getS, &ls, name, mode);
}

Ailtop avatar Sep 03 '24 04:09 Ailtop

That might depend on the mode "bt", actually, but I am certainly not walking the code to find out what specifically is going on. Lua traditionally has a very bad source. I will accept PRs fixing legitimate Lua bugs like this one as long as it doesn't touch the actual lua source.

image

If you could give it a try with both versions and see if it actually matters that would be great.

tilkinsc avatar Sep 04 '24 02:09 tilkinsc

@tilkinsc seems like this could help

Console.OutputEncoding = Encoding.UTF8;


var L = Lua.luaL_newstate();

Lua.luaL_openlibs(L);

unsafe
{
    var chunk = Encoding.UTF8.GetBytes("""
    print("Hello, 世界!")  
    X = "Hello, 世界!"  
    """);
    Lua.luaL_loadbufferx(L, chunk, (nuint)chunk.Length, "main", "bt");
}

var r = Lua.lua_pcallk(L, 0, 0, 0, 0, 0);
// reading X
Lua.lua_getglobal(L, "X");

TryReadString(L, -1, out var str2);
Console.WriteLine(str2);
Console.WriteLine("Hello, 世界!");

Lua.lua_close(L);

And there is helper

unsafe static bool TryReadString(in IntPtr L, in int index, [NotNullWhen(true)] out string? str)
{
    str = null;
    if (Lua.lua_type(L, index) != 4)
        return false;

    var x = Lua.luaL_tolstring(L, index, out var len);
    str = Encoding.UTF8.GetString((byte*)x, (int)len);
    return true;
}

ParadiseFallen avatar Jan 29 '25 12:01 ParadiseFallen