datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Schema qualifiers missing on call to `LogicalPlan::schema`

Open jdye64 opened this issue 3 years ago • 2 comments

Describe the bug When running the query

SELECT * FROM a
            UNION SELECT * FROM b
            UNION SELECT * FROM c
        ORDER BY b NULLS FIRST, c NULLS FIRST

I noticed that the schema returned from &plan.schema() does not have the qualifiers in the result

DFSchema { fields: [DFField { qualifier: None, field: Field { name: "b", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None } }, DFField { qualifier: None, field: Field { name: "c", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None } }], metadata: {} }

The explain of the Union is ...

Union
  Projection: #a.b, #a.c
    TableScan: a projection=[b, c]
  Projection: #b.b, #b.c
    TableScan: b projection=[b, c]
  Projection: #c.b, #c.c
    TableScan: c projection=[b, c]

To Reproduce

    #[test]
    fn union_schema_qualifier_missing() -> Result<()> {
        let schema = Schema::new(vec![
            Field::new("b", DataType::Int32, false),
            Field::new("c", DataType::Int32, false),
        ]);

        let table_a = table_scan(Some("a"), &schema, Some(vec![0, 1]))
            .unwrap()
            .project(vec![col("b"), col("c")])
            .unwrap();

        let table_b = table_scan(Some("b"), &schema, Some(vec![0, 1]))
            .unwrap()
            .project(vec![col("b"), col("c")])
            .unwrap();

        let table_c = table_scan(Some("c"), &schema, Some(vec![0, 1]))
            .unwrap()
            .project(vec![col("b"), col("c")])
            .unwrap();

        let union_plan = table_a
            .union(table_b.build()?)?
            .union(table_c.build()?)?
            .build()?;

        // Get the schema from the resulting logical plan and ensure it has qualifiers
        let schema = union_plan.schema();

        assert_ne!(
            schema.fields()[0].qualifier(),
            None
        );

        Ok(())
    }

Expected behavior The qualifiers be present in the resulting schema.

Additional context None

jdye64 avatar Jul 19 '22 14:07 jdye64

Am I wrong in thinking their should be a qualifier here? Maybe one not being there is actually correct? ANSI SQL does require that all fields for a Union must be the same name and also the same order so maybe it doesn't matter?

jdye64 avatar Jul 19 '22 16:07 jdye64

Without diving very deep, without qualifier seems correct, as the results from different tables are combined (a,b,c) into a new result set which shouldn't have a qualifier.

Dandandan avatar Jul 19 '22 21:07 Dandandan

Usually the column names in the UNION result set are always equal to the column names in the first SELECT statement in the UNION. Fixed by https://github.com/apache/arrow-datafusion/pull/5452. @alamb Can this issue be closed?

yukkit avatar Apr 28 '23 09:04 yukkit