mtasa-blue icon indicating copy to clipboard operation
mtasa-blue copied to clipboard

[New bug] Certain characters like emojis passed from a client, aren't received correctly by server

Open ArranTuna opened this issue 1 year ago • 13 comments

Describe the bug

After updating MTA server I started getting warnings about unable to insert into MySQL such as:

[script] FAIL: (1366) Incorrect string value: '\xED\xA0\xBD\xED\xB8\x81' for column 'msg' at row 1 [Query:INSERT INTO my_history (time, acc, msg) VALUES (1717270392, 'Arran', '😁')]

I narrowed it down to r22396 that has MySQL changes: https://buildinfo.multitheftauto.com/index.php?Revision=22396&Branch=

Steps to reproduce

  1. Windows 64 bit server with v1.6-release-22396 running.
  2. MySQL server you can dbConnect to.
  3. srun dbcon = dbConnect("mysql", "dbname=dbnamehere;host=localhost", "user", "password")
  4. srun dbExec(dbcon, "INSERT INTO table (message) VALUES (?)", "😁")
  5. Get error like FAIL: (1366) Incorrect string value: '\xED\xA0\xBD\xED\xB8\x81' for column 'msg' at row 1

Note that I have the MySQL table set to utf8mb4 and have also tried passing charset=utf8mb4 with dbConnect but it makes no difference. Only using the revision before this fixes it. I checked what gets inserted in the previous version and MySQL workbench shows: 😁 for 😁

Version

Windows 64 bit v1.6-release-22396

Additional context

No response

Relevant log output

No response

Security Policy

  • [X] I have read and understood the Security Policy and this issue is not security related.

ArranTuna avatar Jun 01 '24 20:06 ArranTuna

Also seems to be having issues loading from the DB as I was sent this:

KW5vs9q

ArranTuna avatar Jun 01 '24 20:06 ArranTuna

I can't reproduce it on r22493 (x64).

DB: image Code:

dbcon = dbConnect("mysql", "dbname=testdb;host=localhost", "root", "")
dbExec(dbcon, "SET NAMES utf8mb4")
dbExec(dbcon, "CREATE TABLE IF NOT EXISTS `table` (message TEXT)")
dbExec(dbcon, "INSERT INTO `table` (message) VALUES (?)", "😁")

Result: image

Another test

Code:

dbcon = dbConnect("mysql", "dbname=testdb;host=localhost", "root", "")
dbExec(dbcon, "SET NAMES utf8mb4")

function query(...)
	local queryHandle = dbQuery(dbcon, ...)
	if (not queryHandle) then
		return nil
	end
	local rows = dbPoll(queryHandle, -1)
	return rows
end

iprint(query("SELECT message from `table`"))
iprint(query("SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'"))

Result:

INFO: { {
    message = "😁"
  } }
INFO: { {
    Value = "utf8mb4",
    Variable_name = "character_set_client"
  }, {
    Value = "utf8mb4",
    Variable_name = "character_set_connection"
  }, {
    Value = "utf8mb4",
    Variable_name = "character_set_database"
  }, {
    Value = "binary",
    Variable_name = "character_set_filesystem"
  }, {
    Value = "utf8mb4",
    Variable_name = "character_set_results"
  }, {
    Value = "latin1",
    Variable_name = "character_set_server"
  }, {
    Value = "utf8",
    Variable_name = "character_set_system"
  }, {
    Value = "utf8mb4_general_ci",
    Variable_name = "collation_connection"
  }, {
    Value = "utf8mb4_unicode_ci",
    Variable_name = "collation_database"
  }, {
    Value = "latin1_swedish_ci",
    Variable_name = "collation_server"
  } }

theSarrum avatar Jun 06 '24 13:06 theSarrum

Hmm, I wonder if the MySQL version running on the server being outdated could be causing the problem with the updated version of MTA. What MySQL version do you have running?

ArranTuna avatar Jun 06 '24 13:06 ArranTuna

I couldn't reproduce the issue. I've used MySQL Community Server 8.4.0 with a utf8mb4_bin collation table and an MTA Windows x64 server.

botder avatar Jun 06 '24 13:06 botder

The commit 31c68fd4e3ded499abf123d60ccf12d9de99555c is unrelated. I simply couldn't connect to my local MySQL server without this option.

botder avatar Jun 06 '24 14:06 botder

What MySQL version do you have running?

8.3.0, but I just tested on a very old version (5.7.23) and it's ok too.

theSarrum avatar Jun 06 '24 14:06 theSarrum

I know why this couldn't be reproduced as I just tried the same code in a script and it worked fine, but when done through runcode it fails. This is just so weird though, that this code in runcode would work fine on... now this gets even more weird as I can now reproduce this bug on older versions that I'm sure worked fine, I'm going to try older client version to see if the problem is that the MTA client is sending garbled characters. I just tried with the oldest available nightly, 22388 and that also gives the error.

I've found a possible clue: srun utf8.byte("😁") Command results: 55357 [number]

But when the same character is done in a script: outputChatBox("UTF8.Byte: "..utf8.byte("😁")) UTF8.Byte: 3824460688

So this bug maybe have nothing to do with MySQL but when a player enters certain characters in MTA in their console, GUI, etc they are handled incorrectly.

I've tried in client side runcode and client side file and got: Executing client-side command: utf8.byte("😁") Command results: 55357 [number] In client side file: 1026006976

ArranTuna avatar Jun 07 '24 12:06 ArranTuna

This could be why: srun utf8.byte("😁", 1, 2) Command results: 55357 [number], 56833 [number]

Whereas that executed in a script file: local a, b = utf8.byte("😁", 1, 2) outputChatBox("utf8-byte: "..tostring(a).." "..tostring(b)) utf8-byte: 3964768944 nil

On the client that 1 character is being split into 2. I just wish there was a way to fix the MySQL insertion errors, like what about utf8_bin?

ArranTuna avatar Jun 07 '24 12:06 ArranTuna

For your information, the MTA client has a broken UTF16 to UTF8 conversion when the UTF16 string contains surrogate pairs. Use the function below to fix the broken UTF8 string:

function utf8_decode_utf16_surrogate_pairs(text)
    local characters = {}
    local highSurrogate = 0
    local length = 0

    for position, codepoint in utf8.next, text do
        if highSurrogate > 0 then
            if codepoint >= 0xDC00 and codepoint <= 0xDFFF then
                codepoint = (highSurrogate - 0xD800) * 0x400 + (codepoint - 0xDC00) + 0x10000

                length = length + 1
                characters[length] = utf8.char(codepoint)
            end

            highSurrogate = 0
        elseif codepoint >= 0xD800 and codepoint <= 0xDBFF then
            highSurrogate = codepoint
        else
            length = length + 1
            characters[length] = utf8.char(codepoint)
        end
    end

    return table.concat(characters)
end

botder avatar Jun 07 '24 12:06 botder

I see character_set_server and collation_server as latin1. You can try to change server's collation to utf8mb4, editing my.ini like as here:

[mysqld]
collation_server = utf8mb4_unicode_ci
init_connect='SET NAMES utf8mb4'
character_set_server = utf8mb4

Daemant avatar Jun 24 '24 15:06 Daemant

Thanks for that code botder, it works for hack fixing this bug.

ArranTuna avatar Jun 24 '24 15:06 ArranTuna

If my function worked for you, then this isn't an actual issue with the database nor the libraries used, but with user input conversion to UTF-8. Your database protected you from inserting garbage UTF-8 bytes.

botder avatar Jun 24 '24 18:06 botder

Although that function seems to have stopped most of the errors I just got another: Incorrect string value: '\xF0\x9F\x8D\x95 D...'

And I checked the script and the string already went through utf8_decode_utf16_surrogate_pairs.

ArranTuna avatar Aug 19 '24 14:08 ArranTuna