Projects from Substrait do not include input fields as output fields
Describe the bug
According to the Substrait specification project relations emit all if the input fields followed by the list of new expressions. Datafusion only emits the new expressions.
To Reproduce
Pass a Substrait plan such as the following to Datafusion. (A literal can be used instead of a window function but this is what I had handy.)
{
"extensionUris": [
{
"extensionUriAnchor": 1,
"uri": "/functions_arithmetic.yaml"
}
],
"extensions": [
{
"extensionFunction": {
"extensionUriReference": 1,
"functionAnchor": 1,
"name": "row_number"
}
}
],
"relations": [
{
"root": {
"input": {
"project": {
"common": {
"direct": {}
},
"input": {
"read": {
"common": {
"direct": {}
},
"baseSchema": {
"names": [
"user_id",
"name",
"paid_for_service"
],
"struct": {
"types": [
{
"string": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"string": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"bool": {
"nullability": "NULLABILITY_REQUIRED"
}
}
],
"nullability": "NULLABILITY_REQUIRED"
}
},
"namedTable": {
"names": [
"users"
]
}
}
},
"expressions": [
{
"windowFunction": {
"functionReference": 1,
"sorts": [
{
"expr": {
"selection": {
"directReference": {
"structField": {
"field": 1
}
},
"rootReference": {}
}
},
"direction": "SORT_DIRECTION_ASC_NULLS_FIRST"
}
],
"upperBound": {
"unbounded": {}
},
"lowerBound": {
"unbounded": {}
},
"outputType": {
"i64": {
"nullability": "NULLABILITY_REQUIRED"
}
},
"invocation": 3
}
}
]
}
},
"names": [
"user_id",
"name",
"paid_for_service",
"row_number"
]
}
}
],
"version": {
"minorNumber": 52,
"producer": "spark-substrait-gateway"
}
}
Expected behavior
The result of the plan above would be 4 columns to match the 4 names provided. The current behavior is that Datafusion returns just one column (row_number) for the project.
Additional context
No response
take
IIRC DF also never reads the "emit" directive overall, which I think would need to be fixed as a precursor to fixing this issue, as otherwise there's no way to drop columns at all.
The changes in https://github.com/apache/datafusion/pull/13127 also addressed this.