dbt-sqlserver icon indicating copy to clipboard operation
dbt-sqlserver copied to clipboard

sqlserver_column.py: Handle string dtype of nvarchar

Open Cogito opened this issue 9 months ago • 5 comments

nvarchar is not considered a string in the base adapter, nor in the fabric adapter. This leads to issues when dealing with nvarchar columns, such as in https://github.com/dbt-msft/dbt-sqlserver-utils/blob/master/macros/sql/union.sql

In this commit we add nvarchar to the list of string dtypes, return the correct string_size for nvarchar (which is doubled by default), and use the correct dtype in string_type.

I believe this mishandling of nvarchar is the root cause of https://github.com/dbt-msft/dbt-sqlserver/issues/446

I ran into this issue when unioning multiple relations which used nvarchar. The generated code had no string length specified for the nvarchar fields, which resulted in the default string length of 30 being used.

I'm not sure if how I handle string_type is best, as it is a class method in the base class, but I think we want it to be instance specific as that is how the dtypes are handled.

Additionally, I'm not sure why char_size is twice the string length for nvarchar but I noticed it in my testing which is why I halve it in string_size(). I do understand that nvarchar uses twice the bytes as varchar, so suspect it is related, but I don't know if we would want to fix char_size to be the actual number of characters or just fix it in string_size.

Cogito avatar Apr 07 '25 08:04 Cogito

apologies for the noise, I renamed the branch which closed the first pull request

Cogito avatar Apr 07 '25 08:04 Cogito

Additionally, I'm not sure why char_size is twice the string length for nvarchar but I noticed it in my testing which is why I halve it in string_size(). I do understand that nvarchar uses twice the bytes as varchar, so suspect it is related, but I don't know if we would want to fix char_size to be the actual number of characters or just fix it in string_size.

As you say nvarchar uses twice the bytes to store the same length. Looking at the macros we grab the max_length value from the sys.columns view and I suspect that ends up being reported as the char_size.

That means for nvarchar columns its actually reporting double the size of the string that can be stored in it and the code your adding would handle this, it might need to be handled in the macro itself if the code expects char_size to the number of characters we can store in a column so that its correctly reported through the codebase.

I assume this means nchar columns would also be affected.

Benjamin-Knight avatar Apr 10 '25 14:04 Benjamin-Knight

My reading of the existing code is that char_size should be the declared maximum number of characters allowed, not the number of bytes allocated.

Anywhere it is used it is assumed to be the number of characters. To date I doubt many are using it at all, as nvarchar is not currently interpreted as a string type, so the width of any columns being created is being set to the default (which is 30 I think everywhere?)

As per https://stackoverflow.com/questions/37187654/alternative-to-sys-columns-max-length we could potentially use INFORMATION_SCHEMA.COLUMNS instead of sys.columns in https://github.com/microsoft/dbt-fabric/blob/d3b8665f4e116d0fcbd14e544071f5d3cf7eb15e/dbt/include/fabric/macros/adapters/columns.sql#L9

This would probably require us maintaining a fork of that macro in https://github.com/dbt-msft/dbt-sqlserver/blob/master/dbt/include/sqlserver/macros/adapter/columns.sql unless we can get it fixed in the fabric adapter. Fabric doesn't have nvarchar datatypes so not sure if it makes sense for them to fix it there or not.

Cogito avatar Apr 14 '25 05:04 Cogito

A SQL server specific version of the macro that gets the data seems the safest method to me, as you say it will not be implemented at the Fabric level.

That should mean that the only code change would need to be the update to is_string to handle nchar and nvarchar.

We will want a variation of https://github.com/microsoft/dbt-fabric/blob/main/tests/functional/adapter/test_column_types.py as well to support the new nvarchar and nchar handling.

Benjamin-Knight avatar Apr 14 '25 09:04 Benjamin-Knight

bumping this, what is needed to get it merged? @Benjamin-Knight do you know?

Cogito avatar Sep 29 '25 23:09 Cogito