[Performance]: rollup and time window function. aggregate before sort.
Is there an existing issue for performance?
- [X] I have checked the existing issues.
Environment
- Version or commit-id (e.g. v0.1.0 or 8b23a93):
- Hardware parameters:
- OS type:
- Others:
Details of Performance
Need to implement efficient processing of rollup and time window function. For both cases, we should run aggregate first (call it phase 1), then sort, and process sorted agg on phase 2. While we actually can hash agg on phase 2 too, in general sort it once is easier/better for these two cases and it is usually what user expect anyway.
Additional information
For example: Have you compared MatrixOne with other databases? If yes, what's their difference?
to be tested
Performance will be recorded in this thread: https://github.com/matrixorigin/MO-Cloud/issues/4307
https://github.com/matrixorigin/MO-Cloud/issues/4307#issuecomment-2461199214
rollup 已经是 agg before,无需改动,time window 优化后插入1000w行数据性能对比: ddl: CREATE TABLE IF NOT EXISTS tb_test ( ts TIMESTAMP primary key, temperature FLOAT, humidity INT ); sql: select _wstart, _wend, max(temperature), min(humidity), sum(humidity) from tb_test interval(ts, 100, second); commit: ae8f3607ba76c95a4828a8ec94d1ff91dd354ccb
commit: 4cfb6bc5701a798989efdec7877829947cb307e4
优化后数据量大时,性能提升一个数量级
confirm,closed