Schema qualifiers missing on call to `LogicalPlan::schema`
Describe the bug When running the query
SELECT * FROM a
UNION SELECT * FROM b
UNION SELECT * FROM c
ORDER BY b NULLS FIRST, c NULLS FIRST
I noticed that the schema returned from &plan.schema() does not have the qualifiers in the result
DFSchema { fields: [DFField { qualifier: None, field: Field { name: "b", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None } }, DFField { qualifier: None, field: Field { name: "c", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None } }], metadata: {} }
The explain of the Union is ...
Union
Projection: #a.b, #a.c
TableScan: a projection=[b, c]
Projection: #b.b, #b.c
TableScan: b projection=[b, c]
Projection: #c.b, #c.c
TableScan: c projection=[b, c]
To Reproduce
#[test]
fn union_schema_qualifier_missing() -> Result<()> {
let schema = Schema::new(vec![
Field::new("b", DataType::Int32, false),
Field::new("c", DataType::Int32, false),
]);
let table_a = table_scan(Some("a"), &schema, Some(vec![0, 1]))
.unwrap()
.project(vec![col("b"), col("c")])
.unwrap();
let table_b = table_scan(Some("b"), &schema, Some(vec![0, 1]))
.unwrap()
.project(vec![col("b"), col("c")])
.unwrap();
let table_c = table_scan(Some("c"), &schema, Some(vec![0, 1]))
.unwrap()
.project(vec![col("b"), col("c")])
.unwrap();
let union_plan = table_a
.union(table_b.build()?)?
.union(table_c.build()?)?
.build()?;
// Get the schema from the resulting logical plan and ensure it has qualifiers
let schema = union_plan.schema();
assert_ne!(
schema.fields()[0].qualifier(),
None
);
Ok(())
}
Expected behavior The qualifiers be present in the resulting schema.
Additional context None
Am I wrong in thinking their should be a qualifier here? Maybe one not being there is actually correct? ANSI SQL does require that all fields for a Union must be the same name and also the same order so maybe it doesn't matter?
Without diving very deep, without qualifier seems correct, as the results from different tables are combined (a,b,c) into a new result set which shouldn't have a qualifier.
Usually the column names in the UNION result set are always equal to the column names in the first SELECT statement in the UNION. Fixed by https://github.com/apache/arrow-datafusion/pull/5452. @alamb Can this issue be closed?