crate icon indicating copy to clipboard operation
crate copied to clipboard

Deduce the inner type of a dynamic array that can satisfy mixed types

Open hlcianfagna opened this issue 2 years ago • 4 comments

CrateDB version

5.6.1 and 5.5.3

CrateDB setup information

CRATE_HEAP_SIZE=1g and discovery.type=single-node

Problem description

I came across this importing some complex JSON data. Strings with JSON containing arrays with mixed string/numeric types can be casted to OBJECT without any issues, but when it comes to loading this data into OBJECT columns it seems that we need to either define the schema in advance, explicitly mentioning ARRAY(TEXT) or first inserting a record with only strings in the array, or we need to cast to OBJECT at runtime sacrificing indexing.

Steps to Reproduce

CREATE TABLE testcastjson (obj OBJECT,txt TEXT);

INSERT INTO testcastjson (txt)
SELECT '{"field1":["abc",123]}';

SELECT txt::OBJECT FROM testcastjson;
--> no issues

UPDATE testcastjson SET obj = txt;

Actual Result

UPDATE OK, 0 rows affected (0.101 sec)

Expected Result

1 row affected

hlcianfagna avatar Jan 31 '24 09:01 hlcianfagna

Hi @hlcianfagna, thanks for reporting. To my knowledge we do not supported mixed typed arrays.

SELECT txt::OBJECT FROM testcastjson;
--> no issues

worked because we treated field1 and its sub type as undefined internally,

cr> SELECT pg_typeof(txt::OBJECT['field1'][1]) FROM testcastjson;
+--------------------------------------------------------+
| pg_catalog.pg_typeof(cast(txt AS object)['field1'][1]) |
+--------------------------------------------------------+
| undefined                                              |
+--------------------------------------------------------+
SELECT 1 row in set (0.006 sec)

jeeminso avatar Jan 31 '24 23:01 jeeminso

That explains it, thank you @jeeminso Would you say we should make some clarification in the documentation or maybe consider this a feature request?

hlcianfagna avatar Feb 01 '24 07:02 hlcianfagna

We follow https://learn.microsoft.com/en-us/sql/t-sql/data-types/data-type-precedence-transact-sql?view=sql-server-ver16 for implicit conversions and int to txt is not supported. But maybe we can treat this as a feature. How do you think @seut ?


Btw, txt to int conversion must work but does not:

cr> INSERT INTO testcastjson (txt) SELECT '{"field1":["1",123]}';                                                                                               
INSERT OK, 1 row affected  (0.045 sec)
cr> UPDATE testcastjson SET obj = txt;                                                                                           
UPDATE OK, 0 rows affected  (0.004 sec)

which I think is a bug. update: opened https://github.com/crate/crate/issues/15526

jeeminso avatar Feb 01 '24 14:02 jeeminso

Thank you @seut for your input. So we can consider this as a feature by allowing a session setting to adjust the type precedence i.e. - treating text as the highest precedence. (I would assume that the session setting kicks in after the first insert attempt fails due to mixed type.)

jeeminso avatar Feb 08 '24 14:02 jeeminso