Deduce the inner type of a dynamic array that can satisfy mixed types
CrateDB version
5.6.1 and 5.5.3
CrateDB setup information
CRATE_HEAP_SIZE=1g and discovery.type=single-node
Problem description
I came across this importing some complex JSON data.
Strings with JSON containing arrays with mixed string/numeric types can be casted to OBJECT without any issues, but when it comes to loading this data into OBJECT columns it seems that we need to either define the schema in advance, explicitly mentioning ARRAY(TEXT) or first inserting a record with only strings in the array, or we need to cast to OBJECT at runtime sacrificing indexing.
Steps to Reproduce
CREATE TABLE testcastjson (obj OBJECT,txt TEXT);
INSERT INTO testcastjson (txt)
SELECT '{"field1":["abc",123]}';
SELECT txt::OBJECT FROM testcastjson;
--> no issues
UPDATE testcastjson SET obj = txt;
Actual Result
UPDATE OK, 0 rows affected (0.101 sec)
Expected Result
1 row affected
Hi @hlcianfagna, thanks for reporting. To my knowledge we do not supported mixed typed arrays.
SELECT txt::OBJECT FROM testcastjson;
--> no issues
worked because we treated field1 and its sub type as undefined internally,
cr> SELECT pg_typeof(txt::OBJECT['field1'][1]) FROM testcastjson;
+--------------------------------------------------------+
| pg_catalog.pg_typeof(cast(txt AS object)['field1'][1]) |
+--------------------------------------------------------+
| undefined |
+--------------------------------------------------------+
SELECT 1 row in set (0.006 sec)
That explains it, thank you @jeeminso Would you say we should make some clarification in the documentation or maybe consider this a feature request?
We follow https://learn.microsoft.com/en-us/sql/t-sql/data-types/data-type-precedence-transact-sql?view=sql-server-ver16 for implicit conversions and int to txt is not supported. But maybe we can treat this as a feature. How do you think @seut ?
Btw, txt to int conversion must work but does not:
cr> INSERT INTO testcastjson (txt) SELECT '{"field1":["1",123]}';
INSERT OK, 1 row affected (0.045 sec)
cr> UPDATE testcastjson SET obj = txt;
UPDATE OK, 0 rows affected (0.004 sec)
which I think is a bug. update: opened https://github.com/crate/crate/issues/15526
Thank you @seut for your input. So we can consider this as a feature by allowing a session setting to adjust the type precedence i.e. - treating text as the highest precedence. (I would assume that the session setting kicks in after the first insert attempt fails due to mixed type.)