datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

`octet_length()` function not working for StringView columns (SQLancer)

Open 2010YOUY01 opened this issue 1 year ago • 1 comments

Describe the bug

Native StringView support for octet_length() has been added https://github.com/apache/datafusion/issues/11858 However it's not working for StringView column inside table

See reproducer in datafusion-cli (Compiled from latest main using cargo run, commit a58416c2e) The last query should work since this function should already have StringView support

DataFusion CLI v41.0.0
> create table t1(v1 text);
0 row(s) fetched.
Elapsed 0.058 seconds.

> insert into t1 values ('DataFusion'), ('datafusion');
+-------+
| count |
+-------+
| 2     |
+-------+
1 row(s) fetched.
Elapsed 0.047 seconds.

> create table t1_stringview as
select arrow_cast(v1, 'Utf8View') as v1
from t1;
0 row(s) fetched.
Elapsed 0.011 seconds.

# Now we have two equivalent tables `t1` and `t1_stringview`
# The difference is physical representation for string column (StringArray and StringViewArray)

> select octet_length(v1) from t1;
+---------------------+
| octet_length(t1.v1) |
+---------------------+
| 10                  |
| 10                  |
+---------------------+
2 row(s) fetched.
Elapsed 0.006 seconds.

> select octet_length(v1) from t1_stringview;
Arrow error: Compute error: length not supported for Utf8View

To Reproduce

No response

Expected behavior

No response

Additional context

Found by SQLancer https://github.com/apache/datafusion/issues/11030

2010YOUY01 avatar Aug 24 '24 11:08 2010YOUY01

take

Omega359 avatar Aug 24 '24 22:08 Omega359

This issue will be fixed once the arrow_string dependency is updated.

Omega359 avatar Sep 12 '24 23:09 Omega359

Array-string dependency was updated to 53.1.0 which includes the update from apache/arrow-rs#6305. I'll work on a PR to verify the fix in octet_length()

Omega359 avatar Oct 11 '24 00:10 Omega359