[New bug] Certain characters like emojis passed from a client, aren't received correctly by server
Describe the bug
After updating MTA server I started getting warnings about unable to insert into MySQL such as:
[script] FAIL: (1366) Incorrect string value: '\xED\xA0\xBD\xED\xB8\x81' for column 'msg' at row 1 [Query:INSERT INTO my_history (time, acc, msg) VALUES (1717270392, 'Arran', 'ð')]
I narrowed it down to r22396 that has MySQL changes: https://buildinfo.multitheftauto.com/index.php?Revision=22396&Branch=
Steps to reproduce
- Windows 64 bit server with v1.6-release-22396 running.
- MySQL server you can dbConnect to.
- srun dbcon = dbConnect("mysql", "dbname=dbnamehere;host=localhost", "user", "password")
- srun dbExec(dbcon, "INSERT INTO table (message) VALUES (?)", "ð")
- Get error like FAIL: (1366) Incorrect string value: '\xED\xA0\xBD\xED\xB8\x81' for column 'msg' at row 1
Note that I have the MySQL table set to utf8mb4 and have also tried passing charset=utf8mb4 with dbConnect but it makes no difference. Only using the revision before this fixes it. I checked what gets inserted in the previous version and MySQL workbench shows: à½Ã¸ for ð
Version
Windows 64 bit v1.6-release-22396
Additional context
No response
Relevant log output
No response
Security Policy
- [X] I have read and understood the Security Policy and this issue is not security related.
Also seems to be having issues loading from the DB as I was sent this:
I can't reproduce it on r22493 (x64).
DB:
Code:
dbcon = dbConnect("mysql", "dbname=testdb;host=localhost", "root", "")
dbExec(dbcon, "SET NAMES utf8mb4")
dbExec(dbcon, "CREATE TABLE IF NOT EXISTS `table` (message TEXT)")
dbExec(dbcon, "INSERT INTO `table` (message) VALUES (?)", "😁")
Result:
Another test
Code:
dbcon = dbConnect("mysql", "dbname=testdb;host=localhost", "root", "")
dbExec(dbcon, "SET NAMES utf8mb4")
function query(...)
local queryHandle = dbQuery(dbcon, ...)
if (not queryHandle) then
return nil
end
local rows = dbPoll(queryHandle, -1)
return rows
end
iprint(query("SELECT message from `table`"))
iprint(query("SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'"))
Result:
INFO: { {
message = "😁"
} }
INFO: { {
Value = "utf8mb4",
Variable_name = "character_set_client"
}, {
Value = "utf8mb4",
Variable_name = "character_set_connection"
}, {
Value = "utf8mb4",
Variable_name = "character_set_database"
}, {
Value = "binary",
Variable_name = "character_set_filesystem"
}, {
Value = "utf8mb4",
Variable_name = "character_set_results"
}, {
Value = "latin1",
Variable_name = "character_set_server"
}, {
Value = "utf8",
Variable_name = "character_set_system"
}, {
Value = "utf8mb4_general_ci",
Variable_name = "collation_connection"
}, {
Value = "utf8mb4_unicode_ci",
Variable_name = "collation_database"
}, {
Value = "latin1_swedish_ci",
Variable_name = "collation_server"
} }
Hmm, I wonder if the MySQL version running on the server being outdated could be causing the problem with the updated version of MTA. What MySQL version do you have running?
I couldn't reproduce the issue. I've used MySQL Community Server 8.4.0 with a utf8mb4_bin collation table and an MTA Windows x64 server.
The commit 31c68fd4e3ded499abf123d60ccf12d9de99555c is unrelated. I simply couldn't connect to my local MySQL server without this option.
What MySQL version do you have running?
8.3.0, but I just tested on a very old version (5.7.23) and it's ok too.
I know why this couldn't be reproduced as I just tried the same code in a script and it worked fine, but when done through runcode it fails. This is just so weird though, that this code in runcode would work fine on... now this gets even more weird as I can now reproduce this bug on older versions that I'm sure worked fine, I'm going to try older client version to see if the problem is that the MTA client is sending garbled characters. I just tried with the oldest available nightly, 22388 and that also gives the error.
I've found a possible clue: srun utf8.byte("😁") Command results: 55357 [number]
But when the same character is done in a script: outputChatBox("UTF8.Byte: "..utf8.byte("😁")) UTF8.Byte: 3824460688
So this bug maybe have nothing to do with MySQL but when a player enters certain characters in MTA in their console, GUI, etc they are handled incorrectly.
I've tried in client side runcode and client side file and got: Executing client-side command: utf8.byte("😁") Command results: 55357 [number] In client side file: 1026006976
This could be why: srun utf8.byte("😁", 1, 2) Command results: 55357 [number], 56833 [number]
Whereas that executed in a script file: local a, b = utf8.byte("😁", 1, 2) outputChatBox("utf8-byte: "..tostring(a).." "..tostring(b)) utf8-byte: 3964768944 nil
On the client that 1 character is being split into 2. I just wish there was a way to fix the MySQL insertion errors, like what about utf8_bin?
For your information, the MTA client has a broken UTF16 to UTF8 conversion when the UTF16 string contains surrogate pairs. Use the function below to fix the broken UTF8 string:
function utf8_decode_utf16_surrogate_pairs(text)
local characters = {}
local highSurrogate = 0
local length = 0
for position, codepoint in utf8.next, text do
if highSurrogate > 0 then
if codepoint >= 0xDC00 and codepoint <= 0xDFFF then
codepoint = (highSurrogate - 0xD800) * 0x400 + (codepoint - 0xDC00) + 0x10000
length = length + 1
characters[length] = utf8.char(codepoint)
end
highSurrogate = 0
elseif codepoint >= 0xD800 and codepoint <= 0xDBFF then
highSurrogate = codepoint
else
length = length + 1
characters[length] = utf8.char(codepoint)
end
end
return table.concat(characters)
end
I see character_set_server and collation_server as latin1.
You can try to change server's collation to utf8mb4, editing my.ini like as here:
[mysqld]
collation_server = utf8mb4_unicode_ci
init_connect='SET NAMES utf8mb4'
character_set_server = utf8mb4
Thanks for that code botder, it works for hack fixing this bug.
If my function worked for you, then this isn't an actual issue with the database nor the libraries used, but with user input conversion to UTF-8. Your database protected you from inserting garbage UTF-8 bytes.
Although that function seems to have stopped most of the errors I just got another: Incorrect string value: '\xF0\x9F\x8D\x95 D...'
And I checked the script and the string already went through utf8_decode_utf16_surrogate_pairs.