[BUG] SUM(NULL) unexpectedly returns 0
What is the bug? SUM(NULL) unexpectedly returns 0
How can one reproduce the bug? Steps to reproduce the behavior:
- Make a query SELECT SUM(NULL) from
- See the result as 0 instead of NULL
What is the expected behavior? The result should be NULL
What is your host/environment?
- OpenSearch 1.3.1
- See the result as 0 instead of NULL
@penghuo @joshuali925
The root cause is of this bug is how OpenSearch stores results of Sum aggregation. Would it be appropriate to update OpenSearch to return null instead of 0 if an aggregation did not process any rows?
Consider Avg and Sum aggregations in OpenSearch. Avg is stored as sum and count. When count = 0 the whole aggregation is evaluated as null producing the expected result. Sum, on the other hand, is not aware of how many documents were evaluated and so evaluates to the initial value -- 0. See InternalSum and InternalAvg for reference.
return
nullif an aggregation did not process any rows
Is this an existing standard? OpenSearch returns 0 as 0 is the default value which makes sense to me, maybe SQL standard is different.
Based on SQL code when no rows are processed it should already return null https://github.com/opensearch-project/sql/blob/b2d1a1630ed0f5d4f3000f3b6f5dd631fb898ea7/core/src/main/java/org/opensearch/sql/expression/aggregation/SumAggregator.java#L96-L98
For sum pushed down to OpenSearch side you can create a ticket there, but it will be a breaking change and not sure if there is enough justification.
@joshuali925 This argument is based on the SQL standard. Unfortunately, there doesn't appear to be a way to change the SQL plugin to match the SQL standard unless we move the aggregation processing to the SQL plugin side - which would be a significant change. This might be worth doing? since MIN/MAX/AVG/SUM/etc for SQL do behave slightly differently than their OpenSearch aggregation equivalents. For example: https://github.com/opensearch-project/sql/issues/279 is another issue where the OpenSearch aggregation function behaves differently than the SQL standard.
Yes - you are correct - this would be a breaking change on the OpenSearch side.
That being said, the counter argument is that AVG returns null when the count is 0. What is the justification for the two being different...? The sum of
@acarbonetto Not sure if we should move aggregations to SQL side. It would mean processing aggregations in memory which is not scaleable. Is it possible to make it an exception from the tdvt tests? My argument would be the plugin is opensearch-sql and we push down as much as possible, and the results will be consistent with opensearch.
That being said, the counter argument is that AVG returns null when the count is 0. What is the justification for the two being different...? The sum of or doesn't make sense to be 0 either.
The other way would be to convince OpenSearch to change behavior, i don't know if this argument is convincing enough to add a breaking change there, but we can try it if no other options.
(To answer the question I was referring to sum(), which i think it's standard for programming language libraries to return 0 if list is empty. But i could be wrong and it's not relevant in our case)