doris icon indicating copy to clipboard operation
doris copied to clipboard

[Feature] Support some array function

Open emerkfu opened this issue 3 years ago • 2 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

Description

Now Doris has supported the window_funnel function. The window_funnel function can calculate the depth of the ordered window funnel. If only the depth of the funnel is used, it is still difficult to calculate the conversion rate of each layer of funnel. If the following three functions can be supported, the calculation of the conversion rate of each layer of funnel It will be easy to support, and the calculation of retention analysis can also be implemented.

array_with_constant(length, param)
array_enumerate(arr)
array_join(arr)

Use case

array_with_constant(length, param)

description

Syntax
ARRAY<T> array_with_constant(BIGINT length, T param)

Returns an array of the specified length. length: The length of the array. param: filled content.

example

mysql> select k1, k2, array_with_constant(k1, k2) from array_with_constant_demo_table;
+-----------------+-----------------+-----------------------------+
| k1              | k2              | array_with_constant(k1, k2) |
+-----------------+-----------------+-----------------------------+
| 4               | 1               | [1,1,1,1]                   |
+-----------------+-----------------+-----------------------------+

array_enumerate(arr)

description

Syntax
ARRAY<T> array_enumerate(ARRAY<T> arr)

Returns an array of subscript values of the target array.

example

mysql> select k1, array_enumerate(k1) from array_enumerate_demo_table;
+-----------------+---------------------+
| k1              | array_enumerate(k1) |
+-----------------+---------------------+
| [1, 1, 1, 1]    | [1,2,3,4]           |
+-----------------+---------------------+

array_join(arr)

description

Syntax
T array_join(ARRAY<T> arr)

Returns the expanded result of an array into multiple lines.

example

mysql> select k1, k2, array_join(k2) from array_join_demo_table;
+-----------------+---------------------+----------------+
| k1              | k2                  | array_join(k2) |
+-----------------+---------------------+----------------+
| a1              | [1,2,3,4]           | 1              |
| a1              | [1,2,3,4]           | 2              |
| a1              | [1,2,3,4]           | 3              |
| a1              | [1,2,3,4]           | 4              |
+-----------------+---------------------+----------------+

Related issues

No response

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

emerkfu avatar Aug 05 '22 08:08 emerkfu

Hi, did you use these array functions before? If yes, what database did you use these functions on before? Hope to be provided more use cases, thanks.

xy720 avatar Aug 08 '22 09:08 xy720

Hi, did you use these array functions before? If yes, what database did you use these functions on before? Hope to be provided more use cases, thanks.

I use the above three functions on ClickHouse.

Below is an example of how we used ClickHouse to calculate conversion rates at each level of an ordered window funnel.

1. The funnel process
    login_in -> $pageview -> $end -> $startup -> $end -> $startup
2. Time range 2021-07-01 ~ 2021-07-09
3. Conversion cycle 1 day
CREATE TABLE IF NOT EXISTS test.`event`(
 `distinct_id` Nullable(Int64), 
 `xwhen` Nullable(Int64), 
 `xwhat` Nullable(String)
 ) engine = TinyLog;


SELECT level_index,count(1) FROM
(
    SELECT  distinct_id,
        arrayWithConstant(level, 1) levels,
        arrayEnumerate( levels ) b, -- for_test_col
        arrayJoin(arrayEnumerate( levels )) level_index
      FROM (
        SELECT
          distinct_id,
          windowFunnel(86400)(
            time,
            xwhat = 'login_in',
            xwhat = '$pageview',
            xwhat = '$end',
            xwhat = '$startup',
            xwhat = '$end',
            xwhat = '$startup'
          ) AS level
        FROM (
          SELECT
          distinct_id
          , xwhat
          , toDateTime(toUInt64(round(xwhen/1000))) as time
          FROM test.event
          WHERE (toDate(time) >= '2021-07-01' AND toDate(time) <= '2021-07-09' AND xwhat = 'login_in')
          OR (toDate(time) >= '2021-07-01' AND toDate(time) <= '2021-07-10' AND xwhat IN ('$pageview','$end','$startup'))
        )
        GROUP BY distinct_id
    )
)
GROUP BY level_index
ORDER BY level_index;

emerkfu avatar Aug 09 '22 02:08 emerkfu

@xy720 Hi, what do you think of the above features? Can they be implemented in future versions? Looking forward to your reply, thanks.

emerkfu avatar Aug 27 '22 04:08 emerkfu

@emerkfu Sorry, we haven't put these functions on our first stage agenda. Because these functions have some conflicts with the functions we are developing now. For example: The array_join function is different as our array_join, but more like our explode function. (See help explode in mysql client) We will consider support array_with_constant and array_enumerate in the future version.

xy720 avatar Aug 29 '22 04:08 xy720

all this function have support

  1. array_with_constant function : https://github.com/apache/doris/pull/14115
  2. array_enumerate function : https://github.com/apache/doris/pull/13612
  3. array_join function :https://github.com/apache/doris/pull/8766

mysql> select * from arrays_test; +---------+-----------+ | s | arr | +---------+-----------+ | Goodbye | [] | | Hello | [1, 2] | | World | [3, 4, 5] | +---------+-----------+ 3 rows in set (0.01 sec)

select s,explode_array from arrays_test lateral view explode_outer(arr) temp as explode_array; +---------+---------------+ | s | explode_array | +---------+---------------+ | Goodbye | NULL | | Hello | 1 | | Hello | 2 | | World | 3 | | World | 4 | | World | 5 | +---------+---------------+ 6 rows in set (0.01 sec)

LOVEGISER avatar Dec 01 '22 11:12 LOVEGISER