[bug] concat:vchar is not very usable
concat:vchar overload is not very usable:
impls:
- args:
- value: "varchar<L1>"
name: "input"
variadic:
min: 1
options:
null_handling:
values: [ IGNORE_NULLS, ACCEPT_NULLS ]
return: "varchar<L1>"
Each argument must be varchar<L1> -- that is it must be varchar arguments with exactly the same
length; and it must produce same result (varchar<L1>).
That means that arguments of length L1 cannot be concatenated together; in general
the length of each argument must be at most L1/n (on average). While this works, it's probably not
the desired behavior.
The additional usability issue with this overload is that if using e.g. substrait-go library to construct
varchar literals, the produced literals carry the exact length of the argument (https://github.com/substrait-io/substrait-go/blob/1a2a6ae5fd9986045f69d6ec590875c9af45b88f/literal/utils.go#L296)
This means that you cannot construct an expression to call this overload with e.g.
concat(NewVarChar('hello'), NewVarChar(' world'))
Given that concat already provides variadic overload for strings, this version is superfluous, and error prone. It should probably be removed.
The variadic attribute should allow any number of arguments as long as the minimum of one is provided. There is no maximum specified.
The variadic attribute should allow any number of arguments as long as the minimum of one is provided. There is no maximum specified.
Right; the issue is not in the number of arguments, but the fact that all arguments must be varchar<L1> type.
Ah, I see. Both DuckDB and Snowflake use a maximum length for their varchar literals which is different than Substrait Go's behavior. The L1 requirement does seem to be the issue as it would restrict the input and output to have the same length. The output length could be calculated given the input lengths if we had some way of referencing the lengths of the variadic count. We don't have that capability yet.
The L1 requirement does seem to be the issue as it would restrict the input and output to have the same length.
This is the issue I assumed we're talking about. Specifically, the output having the same length as the input.
For
but the fact that all arguments must be varchar<L1> type.
The expectation is that the producer should/can upcast all of the input types. That being said, I don't know if this function is particularly usable. I think a better concat definition for general use might be
concat(varchar<N>, varchar<M>) -> varchar<N + M>
with no variadics, unless we want a concat version that truncates the output.
Good point @EpsilonPrime
Both DuckDB and Snowflake use a maximum length for their varchar literals
It's an option; literal.NewVarChar is a helper that takes a string -- it is reasonable for it to use VarChar<maxL> imo. On the other hand, if the caller cares that the varchars are of specific length, they can use expression.NewLiteral[T] where they can be explicit about the type.