bigquery-utils
bigquery-utils copied to clipboard
Missing values are not taken into account in UDF "median".
Since this UDF does not take missing values into account in the calculation, I believe it needs to be included in "README.md" or the function needs to be modified.
Currently, when an array containing missing values is given, the length of the array is calculated as is. Therefore, in the following cases, unintended results may be returned. I think we need to add a function that arbitrarily removes or completes the missing values.
Current source code
SELECT IF (
MOD(ARRAY_LENGTH(arr), 2) = 0,
(arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2) - 1)] + arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2))]) / 2,
arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2))]
)
FROM (SELECT ARRAY_AGG(x ORDER BY x) AS arr FROM UNNEST(arr) AS x)
Input
DECLARE arr ARRAY<INT64>;
SET arr = [1, 2, 3, 4, 5, 6, NULL];
Current result
3.0
The real result (Remove missing values)
3.5
So I would like to propose a process using the existing function "PERCENTILE_CONT" in the following form.
SELECT
PERCENTILE_CONT(val, 0.5) OVER()
FROM UNNEST(arr) AS val
LIMIT 1