VarChar encoding problem.
see this: https://github.com/tediousjs/tedious/blob/c22ed1866fafca127f7cf844f65b022380cfd7e9/src/data-types/varchar.js#L55-L68
You use the ascii encoding, is it incorrect?
I see you consider the encoding in value-parser.js.
So I find that the input parameter encoding is not correct, the select result is correct.
maybe relate to https://github.com/tediousjs/tedious/pull/113 , but it should use iconv-lite to encode string.
@heroboy varchar stores 1 byte (ascii) characters, there's an equivalent function in nvarchar datatype object to handle the Unicode characters https://github.com/tediousjs/tedious/blob/85d96fee8fd3c39b08513c64deff40f4141d9e62/src/data-types/nvarchar.js#L55-L62
If you have faced an issue would you please send a repo?
hi @Hadis-Fard, Please try this:
db.acquire((err, conn) =>
{
if (err) return;
var request = new tedious.Request("select '中文',@v1,@v2,@v3; set @v4 = '中文';", function (err)
{
if (err)
{
console.log(err);
return;
}
conn.release();
});
const iconv = require('iconv-lite');
request.on('row', cols =>
console.log(cols.map(x => x.value))
);
request.on('returnValue', (name, value) =>
{
console.log(name, value);
})
console.log(iconv.encode('中文', 'cp936'));
request.addParameter('v1', tedious.TYPES.VarChar, '中文')
request.addParameter('v2', tedious.TYPES.NVarChar, '中文')
request.addParameter('v3', tedious.TYPES.VarChar, iconv.encode('中文', 'cp936'),{length:128})
request.addOutputParameter('v4', tedious.TYPES.VarChar, null);
conn.execSql(request);
})
The output is
[ '中文', '-', '中文', '����' ] //shoud be [ '中文', '中文', '中文', '中文' ], the varchar values are incorrect.
v4 中文
I tried use buffer as the parameter value,but it didn't work. Becuase in VarChar.prototype.validate you convert it to string if it is not string. I changed it to
validate: function validate(value) {
if (value == null) {
return null;
}
if (typeof value !== 'string' && !Buffer.isBuffer(value)) { // I added this isBuffer
if (typeof value.toString !== 'function') {
return TypeError('Invalid string.');
}
value = value.toString();
}
return value;
}
And the output would be
[ '中文', '-', '中文', '中文' ]
v4 中文
The v3 become correct
And I don't understand what you mean about varchar stores 1 byte (ascii) characters. As I know varchar is a byte string, but I am not sure the encoding is depend on SQLServer's settings or client OS settings or something else. Maybe I guess SQLServer's codepage setting is used to convert varchar to nvarchar, otherwise the varchar is always a byte string, what it is is explained by the client side.
Sorry my English is not good. Hope you undenstand what I was saying.
as it is mentioned in MSDN varchar/char is non-Unicode string data, and nvarchar/nchar is unicode string data. so when you have a Unicode string data you should use nvarchar.
Also we want to validate the char parameters to be string or if it is possible we convert it to string for the case of ' iconv.encode('中文', 'cp936')' you should use VarBinary type, as the iconv returns buffer.
If a table column type is varchar, it can correctly store '中文'( Do you think it is an unicode string data that can't be stored?)。For example, the sql is select * from table where some_varchar_column = @value,what the @value should be? In other engine like ado, I can always use varchar correctly. Of cause, in tedious I can use nvarchar, but it will be an implicity type cast in SQLServer.
Another example, set @value = '中文',if the @value is varchar。In tedious, I can correctly get '中文' in returnValue event. This is what I mean "the input parameter encoding is not correct". And prove that when I have a Unicode string data I can use varchar.
I see in varchar.js the function resolveLength think value can be a buffer. writeParameterData calls writable-tracking-buffer.write, it also think value can be a buffer. So I think it is a bug that validate doesn't allow value to be a buffer.
What I do iconv.encode('中文', 'cp936') is bacuse varchar.js doesn't correctly encode the string,I manualy do it. Not becuase it is a varbinary. So please understand Unicode, encoding, string, byte, char. You English speaking people seldom suffer from it.
So please at least make the parameter value can be a buffer. It's simple to fix.
@heroboy 👋
Sorry you're running into this issue with tedious. 🙇Most people that use tedious use an ascii compatible encoding when using varchar columns, and that's the reason why this issue did not come up earlier.
You're right, varchar columns can store data in other encodings, like cp936, and tedious already supports this correctly when reading data out of a table, where it uses iconv-lite to convert from the table's encoding back into the JS native encoding.
Unfortunately, query parameters do not support specifying an encoding option, but it should be possible to add this. I'm against allowing Buffers to be passed in for string types like varchar or nvarchar - the current type handling is already a mess and confusing in many cases, adding more special exceptions like this will just make it harder to clean this up in the future. 😞
I think the correct way to support this is to store the SQL Collation that is send to the client via the ENVCHANGE token, and use that collation automaticall for all non-unicode columns (char, varchar, text).
@heroboy @Hadis-Fard What do you think?
Actually, I just saw that we have an internal sqlCollationChange event, that is not used anywhere. Wiring this all up correctly might be a bit tricky, but I believe this will be the "correct" solution for the encoding/collation issue here.
@arthurschreiber Yes I think use the sql server 's collation setting is correct. When I read tedious source code, I find there is a codepage parameter when parsing value, so the output parameter is correct. But I can't find codepage like setting that sql server send to you when connecting to the sqlserver. I am familiar with tedious protocol.
When writing code, we should always keep in mind that when converting between string and binary buffer, we should always specify an encoding. Using Buffer#toString() or ascii is temporary, not a correct solution.
If you can read SqlServer's encoding setting. https://stackoverflow.com/questions/5182164/sql-server-default-character-encoding It is best and correct. If just adding an encoding in tedious option(or request.addParameter option), I think it is good enough, it makes teidous more flexible that user can store any encoding in varchar columns(though it is incorrect when sorting the column).
Thanks.
Using collation received in ENVCHANGE wouldn't be an ideal solution, as it's possible to have different collation for column, database and SQL Server 🤔
Both ADO and JDBC driver are sending string as NVarchar by default. Here is the TSQL sent by the drivers (captured via SQL Profiler)
-- JDBC
exec sp_executesql N'INSERT INTO c936_table (c1) VALUES(@P0)',N'@P0 nvarchar(4000)',N' 中文'
--ADO
exec sp_executesql N'INSERT INTO [dbo].[c936_table] ([c1]) VALUES (@c1);',N'@c1 nvarchar(11)',@c1=N'中文'
--tedious
exec sp_executesql @statement=N'insert into c936_table values (@c1);',@params=N'@c1 varchar(2)',@c1='-‡'
I was sending value '中文' as string/varchar in all 3 cases, and this was the result. Passing string as NVarchar by default seems to be a better alternative than using database collation on all of the char/varchar columns. Using NVarchar too comes at a cost, but considering that collation can be set at any level (expression, column, database or Server) its better to have a solution to handle them all or none at all.
@v-suhame You are right to me. Thanks.
I'm reopening because this is still an issue - see the discussion in https://github.com/tediousjs/tedious/issues/1294.
@heroboy You posted some code to reproduce this behaviour over in https://github.com/tediousjs/tedious/issues/723#issuecomment-378097495. Do I need to tweak any additional server or database settings (e.g. change the server collation) to reproduce this correctly? 🙇♂️
@arthurschreiber
No. I think what ever collation sql server is, it will produce incorrect output.
But if you modify the validate function in my sample code, and want to make this work:
request.addParameter('v3', tedious.TYPES.VarChar, iconv.encode('中文', 'cp936'),{length:128})
You should set sql server to match the cp936 encoding.