datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Implement IGNORE NULLS for window functions

Open comphead opened this issue 1 year ago • 7 comments

Is your feature request related to a problem or challenge?

Now DataFusion lacks the IGNORE NULLS support for Window functions

DataFusion CLI v35.0.0
❯ select lag(a) ignore nulls over (order by a) from (select 'a' a union all select 'b' a);
This feature is not implemented: Null treatment in aggregate functions is not supported: IGNORE NULLS

Describe the solution you'd like

Add IGNORE NULLS support for:

  • [ ] LAG function
  • [ ] LEAD function
  • [ ] FIRST_VALUE
  • [ ] LAST_VALUE
  • [ ] NTH_VALUE

Describe alternatives you've considered

No response

Additional context

No response

comphead avatar Jan 29 '24 17:01 comphead

@sunchao @viirya cc

comphead avatar Jan 29 '24 19:01 comphead

@alamb are we okay to support such syntax in DF?

select first_value(a) ignore nulls over (partition by id order by id) from (select 1 id, 'a' a union all select 1 id, 'b' a union all select 1 id, null); 

Spark/Oracle supports it fully PG doesn't support such syntax at all. DuckDB supports slightly different syntax

select first_value(a ignore nulls) over (partition by id order by id) from (select 1 id, 'a' a union all select 1 id, 'b' a union all select 1 id, null); 

comphead avatar Jan 30 '24 21:01 comphead

Note that looks like Spark syntax follows SQL 2016 standard:

https://github.com/ronsavage/SQL/blob/19215adc8639d031a44acad7873c209444b71f1f/sql-2016.ebnf#L1397

lead_or_lag_function ::=
  lead_or_lag left_paren lead_or_lag_extent
      ( comma offset ( comma default_expression )? )? right_paren
      null_treatment?
...
null_treatment ::=
  RESPECT NULLS | IGNORE NULLS

I don't find this syntax in SQL 2003.

viirya avatar Jan 30 '24 21:01 viirya

Thanks @viirya Looks like currently the DF is able to parse the SQL 2016 standard syntax for this specific case, so it is probably worth to try implementing IGNORE NULLS support

@mustafasrepo cc if you have any opinions on that?

comphead avatar Jan 31 '24 16:01 comphead

It seems there is no defined opinion yet, I'll create a PR and we can discuss in details

comphead avatar Feb 02 '24 17:02 comphead

It makes sense to me -- maybe @ozankabak or @mustafasrepo have some thoughts in this area as well

alamb avatar Feb 02 '24 20:02 alamb

Thanks @viirya Looks like currently the DF is able to parse the SQL 2016 standard syntax for this specific case, so it is probably worth to try implementing IGNORE NULLS support

@mustafasrepo cc if you have any opinions on that?

I would be nice to have this feature given that we have parsing support already, and it is a standart feature. Thanks @comphead for bringing this up.

mustafasrepo avatar Feb 05 '24 07:02 mustafasrepo

Nice, everything is done, so closing it. thanks @mustafasrepo @huaxingao @viirya @alamb @ozankabak

comphead avatar Mar 21 '24 15:03 comphead