datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Only update TopK dynamic filters if the new ones are more selective

Open adriangb opened this issue 8 months ago • 2 comments

Closes #16432

The idea here is to introduce a global thresholds reference that gets updated across all partitions. This could drastically speed up early termination.

adriangb avatar Jun 17 '25 22:06 adriangb

It seems in some cases it's faster:

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ topk-dynamic-filter ┃ topk-filters ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1           │            16.17 ms │     16.89 ms │     no change │
│ Q2           │            27.60 ms │     20.22 ms │ +1.36x faster │
│ Q3           │            55.41 ms │     54.45 ms │     no change │
│ Q4           │            22.26 ms │     22.07 ms │     no change │
│ Q5           │            11.98 ms │     12.10 ms │     no change │
│ Q6           │            26.25 ms │     27.26 ms │     no change │
│ Q7           │            71.56 ms │     67.86 ms │ +1.05x faster │
│ Q8           │            90.02 ms │     46.15 ms │ +1.95x faster │
│ Q9           │            61.64 ms │     58.91 ms │     no change │
│ Q10          │           101.13 ms │     98.24 ms │     no change │
│ Q11          │            58.43 ms │     56.85 ms │     no change │
└──────────────┴─────────────────────┴──────────────┴───────────────┘

And in other cases it's mixed / slower:

┃ Query        ┃ topk-dynamic-filter ┃ topk-filters ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │            14.00 ms │     13.07 ms │ +1.07x faster │
│ QQuery 1     │            20.55 ms │     18.81 ms │ +1.09x faster │
│ QQuery 2     │            56.58 ms │     56.33 ms │     no change │
│ QQuery 3     │            56.30 ms │     54.94 ms │     no change │
│ QQuery 4     │           468.00 ms │    516.71 ms │  1.10x slower │
│ QQuery 5     │           626.85 ms │    659.29 ms │  1.05x slower │
│ QQuery 6     │            15.31 ms │     15.26 ms │     no change │
│ QQuery 7     │            30.44 ms │     21.48 ms │ +1.42x faster │
│ QQuery 8     │           547.48 ms │    569.67 ms │     no change │
│ QQuery 9     │           744.50 ms │    778.67 ms │     no change │
│ QQuery 10    │           182.53 ms │    176.03 ms │     no change │
│ QQuery 11    │           217.10 ms │    190.23 ms │ +1.14x faster │
│ QQuery 12    │           832.33 ms │    715.38 ms │ +1.16x faster │
│ QQuery 13    │          1102.37 ms │    914.96 ms │ +1.20x faster │
│ QQuery 14    │           823.97 ms │    576.94 ms │ +1.43x faster │
│ QQuery 15    │           574.53 ms │    496.51 ms │ +1.16x faster │
│ QQuery 16    │          1324.12 ms │   1150.68 ms │ +1.15x faster │
│ QQuery 17    │          1324.29 ms │   1245.55 ms │ +1.06x faster │
│ QQuery 18    │          2603.07 ms │   2248.86 ms │ +1.16x faster │
│ QQuery 19    │            48.17 ms │     49.60 ms │     no change │
│ QQuery 20    │           978.10 ms │    964.90 ms │     no change │
│ QQuery 21    │          1021.47 ms │   1050.51 ms │     no change │
│ QQuery 22    │          1958.07 ms │   1660.81 ms │ +1.18x faster │
│ QQuery 23    │           760.66 ms │   5353.24 ms │  7.04x slower │
│ QQuery 24    │           525.78 ms │    342.39 ms │ +1.54x faster │
│ QQuery 25    │           480.36 ms │    278.59 ms │ +1.72x faster │
│ QQuery 26    │           542.13 ms │    342.32 ms │ +1.58x faster │
│ QQuery 27    │          1826.05 ms │   1260.73 ms │ +1.45x faster │
│ QQuery 28    │         11094.31 ms │  10685.64 ms │     no change │
│ QQuery 29    │           427.18 ms │    432.16 ms │     no change │
│ QQuery 30    │           826.43 ms │    539.39 ms │ +1.53x faster │
│ QQuery 31    │           789.89 ms │    553.49 ms │ +1.43x faster │
│ QQuery 32    │          2580.93 ms │   2339.06 ms │ +1.10x faster │
│ QQuery 33    │          2660.40 ms │   2761.14 ms │     no change │
│ QQuery 34    │          2948.08 ms │   2923.44 ms │     no change │
│ QQuery 35    │           886.56 ms │    810.35 ms │ +1.09x faster │
│ QQuery 36    │            18.59 ms │     83.97 ms │  4.52x slower │
│ QQuery 37    │            18.67 ms │     35.52 ms │  1.90x slower │
│ QQuery 38    │            18.10 ms │     81.86 ms │  4.52x slower │
│ QQuery 39    │            17.89 ms │    135.04 ms │  7.55x slower │
│ QQuery 40    │            18.30 ms │     27.70 ms │  1.51x slower │
│ QQuery 41    │            18.15 ms │     26.07 ms │  1.44x slower │
│ QQuery 42    │            17.78 ms │     22.56 ms │  1.27x slower │
└──────────────┴─────────────────────┴──────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓

Dandandan avatar Jun 18 '25 22:06 Dandandan

Seems like a bug in my implementation right? I'd be surprised if the update checks I added are that heavy compared to other work...

adriangb avatar Jun 18 '25 22:06 adriangb

Hm my earlier benchmarks didn't seem correct. not sure where the earlier run came from 🤔

Dandandan avatar Jun 19 '25 05:06 Dandandan

In hindsight I think actually another fact that we don't see topk being as effective with more partitions is that spreading them over partitions will essentially make topk(n) into a topk(n*partitions). So a limit of 10 will be 200 with 20 partitions, and therefore will reduce the selectivity of the filter.

Synchronizing based on the max of each heap will roughly transform it to a heap with n*partitions - partitions, so it doesn't help that much with better selectivity.

Dandandan avatar Jun 19 '25 09:06 Dandandan

(Removed a non-working proposal).

I am wondering if we can somehow synchronize the values in the heap efficiently in order to make the filter for a topk(n*partitions) as efficient as topk(n) 🤔

Dandandan avatar Jun 19 '25 09:06 Dandandan

I am wondering if we can somehow synchronize the values in the heap efficiently in order to make the filter for a topk(n*partitions) as efficient as topk(n) 🤔

I think the only way to do that would be to make a global heap and put a lock on it, similar to what we do with the filters?

adriangb avatar Jun 19 '25 12:06 adriangb

Hm my earlier benchmarks didn't seem correct. not sure where the earlier run came from 🤔

What do the current ones show? Not much improvement?

adriangb avatar Jun 19 '25 12:06 adriangb

Hm my earlier benchmarks didn't seem correct. not sure where the earlier run came from 🤔

What do the current ones show? Not much improvement?

Yes about the same compared to main.

Dandandan avatar Jun 19 '25 14:06 Dandandan

We could try a shared heap. It might work? I guess it will be a sort of balance between lock contention and better selectivity. Maybe we can balance it by having distinct heaps for writes with no locks but read only references to all of them so that when we do reads we compute on the fly the "combined" heap? Then we don't need any locks. The cost is that computations on the heap are larger but as long as k ~ constant then it should be fine.

adriangb avatar Jun 19 '25 14:06 adriangb

We could try a shared heap. It might work? I guess it will be a sort of balance between lock contention and better selectivity. Maybe we can balance it by having distinct heaps for writes with no locks but read only references to all of them so that when we do reads we compute on the fly the "combined" heap? Then we don't need any locks. The cost is that computations on the heap are larger but as long as k ~ constant then it should be fine.

how would you compute the shared heap on the fly?

I was thinking something similar: each write to a own heap, only write the ones that updated the local heap to the shared heap (limiting the lock access time).

Dandandan avatar Jun 19 '25 23:06 Dandandan

how would you compute the shared heap on the fly?

I was thinking you'd compute the top K of the top K * partitions on the fly.

But maybe your proposal makes more sense.

adriangb avatar Jun 19 '25 23:06 adriangb

Marking as draft until we sort out #16501

adriangb avatar Jun 27 '25 18:06 adriangb

I did a bench run, confounding results:

┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ topk-filters ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1           │  9.96 ms │     10.70 ms │  1.07x slower │
│ Q2           │ 12.63 ms │     15.30 ms │  1.21x slower │
│ Q3           │ 33.45 ms │     35.22 ms │  1.05x slower │
│ Q4           │ 15.22 ms │     16.82 ms │  1.11x slower │
│ Q5           │  9.87 ms │      8.85 ms │ +1.12x faster │
│ Q6           │ 16.54 ms │     22.16 ms │  1.34x slower │
│ Q7           │ 43.41 ms │     44.00 ms │     no change │
│ Q8           │ 22.91 ms │     34.56 ms │  1.51x slower │
│ Q9           │ 38.68 ms │     39.60 ms │     no change │
│ Q10          │ 65.73 ms │     73.24 ms │  1.11x slower │
│ Q11          │ 34.78 ms │     37.14 ms │  1.07x slower │
└──────────────┴──────────┴──────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary           ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (main)           │ 303.17ms │
│ Total Time (topk-filters)   │ 337.58ms │
│ Average Time (main)         │  27.56ms │
│ Average Time (topk-filters) │  30.69ms │
│ Queries Faster              │        1 │
│ Queries Slower              │        8 │
│ Queries with No Change      │        2 │
│ Queries with Failure        │        0 │
└─────────────────────────────┴──────────┘

Mostly... slower

adriangb avatar Jul 04 '25 12:07 adriangb

I am taking a look now, see if I can find a thing

Dandandan avatar Jul 04 '25 20:07 Dandandan

I tried some stuff. Still not an obvious improvement:

/bench.sh compare main topk-filters
Comparing main and topk-filters
--------------------
Benchmark run_topk_tpch.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ topk-filters ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1           │ 12.46 ms │     10.55 ms │ +1.18x faster │
│ Q2           │ 14.02 ms │     12.06 ms │ +1.16x faster │
│ Q3           │ 29.39 ms │     30.10 ms │     no change │
│ Q4           │ 13.74 ms │     12.79 ms │ +1.07x faster │
│ Q5           │  7.44 ms │      7.09 ms │     no change │
│ Q6           │ 15.87 ms │     13.79 ms │ +1.15x faster │
│ Q7           │ 37.22 ms │     44.24 ms │  1.19x slower │
│ Q8           │ 27.51 ms │     25.17 ms │ +1.09x faster │
│ Q9           │ 33.04 ms │     31.28 ms │ +1.06x faster │
│ Q10          │ 55.26 ms │     58.96 ms │  1.07x slower │
│ Q11          │ 28.74 ms │     29.75 ms │     no change │
└──────────────┴──────────┴──────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary           ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (main)           │ 274.68ms │
│ Total Time (topk-filters)   │ 275.79ms │
│ Average Time (main)         │  24.97ms │
│ Average Time (topk-filters) │  25.07ms │
│ Queries Faster              │        6 │
│ Queries Slower              │        2 │
│ Queries with No Change      │        3 │
│ Queries with Failure        │        0 │
└─────────────────────────────┴──────────┘

adriangb avatar Aug 13 '25 16:08 adriangb

ClickBench results for partitioned datasets look much better, the only query that is truly slower is a very fast one anyway (so probably noise). But also it seems like most of these should not be faster 🤔 so maybe my measurement is off.

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       main ┃ topk-filters ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │    0.92 ms │      1.15 ms │  1.25x slower │
│ QQuery 1     │   12.45 ms │     11.01 ms │ +1.13x faster │
│ QQuery 2     │   42.41 ms │     35.19 ms │ +1.21x faster │
│ QQuery 3     │   40.63 ms │     33.55 ms │ +1.21x faster │
│ QQuery 4     │  337.67 ms │    267.36 ms │ +1.26x faster │
│ QQuery 5     │  434.44 ms │    362.25 ms │ +1.20x faster │
│ QQuery 6     │    1.21 ms │      1.05 ms │ +1.15x faster │
│ QQuery 7     │   14.20 ms │     11.94 ms │ +1.19x faster │
│ QQuery 8     │  413.55 ms │    317.13 ms │ +1.30x faster │
│ QQuery 9     │  601.85 ms │    480.06 ms │ +1.25x faster │
│ QQuery 10    │  100.43 ms │     84.75 ms │ +1.18x faster │
│ QQuery 11    │  108.91 ms │     91.61 ms │ +1.19x faster │
│ QQuery 12    │  419.13 ms │    323.82 ms │ +1.29x faster │
│ QQuery 13    │  570.70 ms │    481.27 ms │ +1.19x faster │
│ QQuery 14    │  428.36 ms │    310.63 ms │ +1.38x faster │
│ QQuery 15    │  426.88 ms │    309.22 ms │ +1.38x faster │
│ QQuery 16    │ 1251.45 ms │    729.69 ms │ +1.72x faster │
│ QQuery 17    │ 1175.59 ms │    738.35 ms │ +1.59x faster │
│ QQuery 18    │ 2670.84 ms │   1945.33 ms │ +1.37x faster │
│ QQuery 19    │   41.71 ms │     32.77 ms │ +1.27x faster │
│ QQuery 20    │ 1002.98 ms │    738.13 ms │ +1.36x faster │
│ QQuery 21    │  969.39 ms │    763.04 ms │ +1.27x faster │
│ QQuery 22    │ 1506.05 ms │   1214.86 ms │ +1.24x faster │
│ QQuery 23    │ 4418.24 ms │   4008.72 ms │ +1.10x faster │
│ QQuery 24    │  198.37 ms │    196.65 ms │     no change │
│ QQuery 25    │  150.97 ms │    150.72 ms │     no change │
│ QQuery 26    │  200.45 ms │    191.65 ms │     no change │
│ QQuery 27    │ 1007.57 ms │    893.24 ms │ +1.13x faster │
│ QQuery 28    │ 7214.47 ms │   7065.58 ms │     no change │
│ QQuery 29    │  356.81 ms │    327.34 ms │ +1.09x faster │
│ QQuery 30    │  323.66 ms │    310.25 ms │     no change │
│ QQuery 31    │  366.94 ms │    346.37 ms │ +1.06x faster │
│ QQuery 32    │ 2028.11 ms │   1692.80 ms │ +1.20x faster │
│ QQuery 33    │ 2286.38 ms │   1775.46 ms │ +1.29x faster │
│ QQuery 34    │ 1978.34 ms │   2305.40 ms │  1.17x slower │
│ QQuery 35    │  555.06 ms │    552.82 ms │     no change │
│ QQuery 36    │   61.16 ms │     62.54 ms │     no change │
│ QQuery 37    │   24.59 ms │     23.99 ms │     no change │
│ QQuery 38    │   61.70 ms │     63.87 ms │     no change │
│ QQuery 39    │  102.67 ms │    103.98 ms │     no change │
│ QQuery 40    │   17.16 ms │     18.43 ms │  1.07x slower │
│ QQuery 41    │   16.16 ms │     17.05 ms │  1.06x slower │
│ QQuery 42    │   12.72 ms │     13.14 ms │     no change │
└──────────────┴────────────┴──────────────┴───────────────┘

adriangb avatar Aug 13 '25 17:08 adriangb

Okay I've re-run after 310100b04 and 0406770e9 and am now seeing only improvements across the board. @Dandandan I've requested another review from you since I think this may be in a good place now.

Comparing main and topk-filters
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       main ┃ topk-filters ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │    1.02 ms │      1.09 ms │  1.07x slower │
│ QQuery 1     │   10.91 ms │     10.86 ms │     no change │
│ QQuery 2     │   35.70 ms │     35.36 ms │     no change │
│ QQuery 3     │   34.36 ms │     34.19 ms │     no change │
│ QQuery 4     │  262.99 ms │    274.64 ms │     no change │
│ QQuery 5     │  352.01 ms │    349.98 ms │     no change │
│ QQuery 6     │    0.94 ms │      0.89 ms │ +1.05x faster │
│ QQuery 7     │   12.28 ms │     12.10 ms │     no change │
│ QQuery 8     │  325.15 ms │    315.93 ms │     no change │
│ QQuery 9     │  484.79 ms │    488.97 ms │     no change │
│ QQuery 10    │   83.34 ms │     84.09 ms │     no change │
│ QQuery 11    │   94.43 ms │     95.31 ms │     no change │
│ QQuery 12    │  328.80 ms │    330.22 ms │     no change │
│ QQuery 13    │  474.79 ms │    479.54 ms │     no change │
│ QQuery 14    │  306.84 ms │    312.43 ms │     no change │
│ QQuery 15    │  317.85 ms │    360.54 ms │  1.13x slower │
│ QQuery 16    │  727.99 ms │    775.94 ms │  1.07x slower │
│ QQuery 17    │  735.20 ms │    706.36 ms │     no change │
│ QQuery 18    │ 1758.82 ms │   1707.14 ms │     no change │
│ QQuery 19    │   32.96 ms │     28.91 ms │ +1.14x faster │
│ QQuery 20    │  744.06 ms │    688.16 ms │ +1.08x faster │
│ QQuery 21    │  765.40 ms │    742.52 ms │     no change │
│ QQuery 22    │ 1187.61 ms │   1165.92 ms │     no change │
│ QQuery 23    │ 3926.39 ms │   3559.50 ms │ +1.10x faster │
│ QQuery 24    │  197.26 ms │    197.36 ms │     no change │
│ QQuery 25    │  153.29 ms │    153.77 ms │     no change │
│ QQuery 26    │  200.05 ms │    196.67 ms │     no change │
│ QQuery 27    │  873.53 ms │    865.01 ms │     no change │
│ QQuery 28    │ 6789.19 ms │   6814.33 ms │     no change │
│ QQuery 29    │  329.61 ms │    345.40 ms │     no change │
│ QQuery 30    │  308.34 ms │    305.14 ms │     no change │
│ QQuery 31    │  342.00 ms │    312.30 ms │ +1.10x faster │
│ QQuery 32    │ 1601.46 ms │   1559.97 ms │     no change │
│ QQuery 33    │ 1688.74 ms │   1599.86 ms │ +1.06x faster │
│ QQuery 34    │ 1808.21 ms │   1656.78 ms │ +1.09x faster │
│ QQuery 35    │  504.57 ms │    491.27 ms │     no change │
│ QQuery 36    │   63.43 ms │     60.57 ms │     no change │
│ QQuery 37    │   23.80 ms │     23.91 ms │     no change │
│ QQuery 38    │   61.87 ms │     59.87 ms │     no change │
│ QQuery 39    │  104.07 ms │    101.26 ms │     no change │
│ QQuery 40    │   18.40 ms │     17.14 ms │ +1.07x faster │
│ QQuery 41    │   17.29 ms │     16.09 ms │ +1.07x faster │
│ QQuery 42    │   14.18 ms │     13.06 ms │ +1.09x faster │
└──────────────┴────────────┴──────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary           ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)           │ 28103.91ms │
│ Total Time (topk-filters)   │ 27350.36ms │
│ Average Time (main)         │   653.58ms │
│ Average Time (topk-filters) │   636.05ms │
│ Queries Faster              │         10 │
│ Queries Slower              │          3 │
│ Queries with No Change      │         30 │
│ Queries with Failure        │          0 │
└─────────────────────────────┴────────────┘
--------------------
Benchmark run_topk_tpch.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ topk-filters ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1           │ 11.22 ms │      4.75 ms │ +2.36x faster │
│ Q2           │ 13.90 ms │      8.78 ms │ +1.58x faster │
│ Q3           │ 33.71 ms │     30.13 ms │ +1.12x faster │
│ Q4           │ 13.81 ms │      9.02 ms │ +1.53x faster │
│ Q5           │  8.65 ms │      8.68 ms │     no change │
│ Q6           │ 14.67 ms │     16.06 ms │  1.09x slower │
│ Q7           │ 45.67 ms │     41.05 ms │ +1.11x faster │
│ Q8           │ 27.73 ms │     16.93 ms │ +1.64x faster │
│ Q9           │ 35.25 ms │     21.90 ms │ +1.61x faster │
│ Q10          │ 56.27 ms │     37.45 ms │ +1.50x faster │
│ Q11          │ 32.06 ms │      5.87 ms │ +5.46x faster │
└──────────────┴──────────┴──────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary           ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (main)           │ 292.93ms │
│ Total Time (topk-filters)   │ 200.61ms │
│ Average Time (main)         │  26.63ms │
│ Average Time (topk-filters) │  18.24ms │
│ Queries Faster              │        9 │
│ Queries Slower              │        1 │
│ Queries with No Change      │        1 │
│ Queries with Failure        │        0 │
└─────────────────────────────┴──────────┘

adriangb avatar Aug 16 '25 19:08 adriangb

🤖 ./gh_compare_branch.sh Benchmark Script Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing topk-filters (b7ac320c0d9fcc72f2bb3750cf19ea7be12ff320) to 8f15991f33bf6aca9d4da8958141b59d196b2ed6 diff using: tpch_mem clickbench_partitioned clickbench_extended Results will be posted here when complete

alamb avatar Aug 18 '25 15:08 alamb

🤖: Benchmark completed

Details

Comparing HEAD and topk-filters
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ topk-filters ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  2098.75 ms │   2062.77 ms │     no change │
│ QQuery 1     │   736.19 ms │    636.32 ms │ +1.16x faster │
│ QQuery 2     │  1447.86 ms │   1301.32 ms │ +1.11x faster │
│ QQuery 3     │   640.39 ms │    605.76 ms │ +1.06x faster │
│ QQuery 4     │  1297.18 ms │   1346.75 ms │     no change │
│ QQuery 5     │ 13751.05 ms │  14220.36 ms │     no change │
│ QQuery 6     │  2400.54 ms │   2335.19 ms │     no change │
│ QQuery 7     │  1933.90 ms │   2000.55 ms │     no change │
└──────────────┴─────────────┴──────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary           ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)           │ 24305.85ms │
│ Total Time (topk-filters)   │ 24509.01ms │
│ Average Time (HEAD)         │  3038.23ms │
│ Average Time (topk-filters) │  3063.63ms │
│ Queries Faster              │          3 │
│ Queries Slower              │          0 │
│ Queries with No Change      │          5 │
│ Queries with Failure        │          0 │
└─────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ topk-filters ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.29 ms │      2.43 ms │  1.06x slower │
│ QQuery 1     │    28.83 ms │     29.37 ms │     no change │
│ QQuery 2     │    75.39 ms │     76.90 ms │     no change │
│ QQuery 3     │    88.02 ms │     90.54 ms │     no change │
│ QQuery 4     │   585.22 ms │    605.52 ms │     no change │
│ QQuery 5     │   849.36 ms │    886.24 ms │     no change │
│ QQuery 6     │     2.32 ms │      2.36 ms │     no change │
│ QQuery 7     │    31.99 ms │     33.28 ms │     no change │
│ QQuery 8     │   830.49 ms │    866.81 ms │     no change │
│ QQuery 9     │  1110.42 ms │   1160.29 ms │     no change │
│ QQuery 10    │   221.99 ms │    221.95 ms │     no change │
│ QQuery 11    │   255.21 ms │    260.43 ms │     no change │
│ QQuery 12    │   826.88 ms │    862.45 ms │     no change │
│ QQuery 13    │  1186.96 ms │   1159.18 ms │     no change │
│ QQuery 14    │   767.58 ms │    799.01 ms │     no change │
│ QQuery 15    │   717.27 ms │    794.54 ms │  1.11x slower │
│ QQuery 16    │  1552.61 ms │   1604.86 ms │     no change │
│ QQuery 17    │  1542.14 ms │   1589.89 ms │     no change │
│ QQuery 18    │  2895.45 ms │   2868.94 ms │     no change │
│ QQuery 19    │    79.79 ms │     81.57 ms │     no change │
│ QQuery 20    │  1278.75 ms │   1211.54 ms │ +1.06x faster │
│ QQuery 21    │  1374.67 ms │   1346.29 ms │     no change │
│ QQuery 22    │  2277.06 ms │   2246.93 ms │     no change │
│ QQuery 23    │  7751.39 ms │   6855.54 ms │ +1.13x faster │
│ QQuery 24    │   423.18 ms │    406.23 ms │     no change │
│ QQuery 25    │   294.80 ms │    289.58 ms │     no change │
│ QQuery 26    │   421.41 ms │    404.45 ms │     no change │
│ QQuery 27    │  1609.35 ms │   1548.29 ms │     no change │
│ QQuery 28    │ 12027.18 ms │  11975.67 ms │     no change │
│ QQuery 29    │   499.72 ms │    516.83 ms │     no change │
│ QQuery 30    │   762.44 ms │    768.76 ms │     no change │
│ QQuery 31    │   788.02 ms │    800.18 ms │     no change │
│ QQuery 32    │  2404.71 ms │   2486.31 ms │     no change │
│ QQuery 33    │  3305.56 ms │   3284.68 ms │     no change │
│ QQuery 34    │  3362.30 ms │   3402.04 ms │     no change │
│ QQuery 35    │  1299.53 ms │   1264.01 ms │     no change │
│ QQuery 36    │   133.03 ms │    130.08 ms │     no change │
│ QQuery 37    │    54.36 ms │     56.59 ms │     no change │
│ QQuery 38    │   124.72 ms │    124.72 ms │     no change │
│ QQuery 39    │   210.60 ms │    203.82 ms │     no change │
│ QQuery 40    │    46.04 ms │     42.45 ms │ +1.08x faster │
│ QQuery 41    │    39.97 ms │     40.83 ms │     no change │
│ QQuery 42    │    33.44 ms │     33.46 ms │     no change │
└──────────────┴─────────────┴──────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary           ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)           │ 54172.42ms │
│ Total Time (topk-filters)   │ 53435.87ms │
│ Average Time (HEAD)         │  1259.82ms │
│ Average Time (topk-filters) │  1242.69ms │
│ Queries Faster              │          3 │
│ Queries Slower              │          2 │
│ Queries with No Change      │         38 │
│ Queries with Failure        │          0 │
└─────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ topk-filters ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  99.94 ms │    101.05 ms │     no change │
│ QQuery 2     │  20.73 ms │     20.97 ms │     no change │
│ QQuery 3     │  32.03 ms │     31.87 ms │     no change │
│ QQuery 4     │  18.61 ms │     18.43 ms │     no change │
│ QQuery 5     │  49.34 ms │     49.64 ms │     no change │
│ QQuery 6     │  12.16 ms │     11.98 ms │     no change │
│ QQuery 7     │  90.46 ms │     87.64 ms │     no change │
│ QQuery 8     │  25.89 ms │     24.74 ms │     no change │
│ QQuery 9     │  53.51 ms │     53.00 ms │     no change │
│ QQuery 10    │  42.89 ms │     40.12 ms │ +1.07x faster │
│ QQuery 11    │  40.37 ms │     38.06 ms │ +1.06x faster │
│ QQuery 12    │  31.03 ms │     29.84 ms │     no change │
│ QQuery 13    │  26.19 ms │     25.31 ms │     no change │
│ QQuery 14    │  10.26 ms │      9.73 ms │ +1.05x faster │
│ QQuery 15    │  19.66 ms │     19.11 ms │     no change │
│ QQuery 16    │  17.74 ms │     17.35 ms │     no change │
│ QQuery 17    │  96.68 ms │     95.82 ms │     no change │
│ QQuery 18    │ 188.86 ms │    186.57 ms │     no change │
│ QQuery 19    │  23.27 ms │     23.38 ms │     no change │
│ QQuery 20    │  30.53 ms │     32.85 ms │  1.08x slower │
│ QQuery 21    │ 138.68 ms │    138.00 ms │     no change │
│ QQuery 22    │  14.17 ms │     13.91 ms │     no change │
└──────────────┴───────────┴──────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary           ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)           │ 1082.97ms │
│ Total Time (topk-filters)   │ 1069.38ms │
│ Average Time (HEAD)         │   49.23ms │
│ Average Time (topk-filters) │   48.61ms │
│ Queries Faster              │         3 │
│ Queries Slower              │         1 │
│ Queries with No Change      │        18 │
│ Queries with Failure        │         0 │
└─────────────────────────────┴───────────┘

alamb avatar Aug 18 '25 16:08 alamb

🤖 ./gh_compare_branch.sh Benchmark Script Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing topk-filters (3945aba663062debc24757c857a8bd679ad8b223) to 8f15991f33bf6aca9d4da8958141b59d196b2ed6 diff using: tpch_mem clickbench_partitioned clickbench_extended Results will be posted here when complete

alamb avatar Aug 18 '25 17:08 alamb

@alamb I feel like the interesting benchmark to run is topk_tpch

adriangb avatar Aug 18 '25 18:08 adriangb

Really nice 🚀

Dandandan avatar Aug 18 '25 18:08 Dandandan

🤖: Benchmark completed

Details

Comparing HEAD and topk-filters
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ topk-filters ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  2053.78 ms │   1974.31 ms │     no change │
│ QQuery 1     │   735.78 ms │    660.02 ms │ +1.11x faster │
│ QQuery 2     │  1460.75 ms │   1307.73 ms │ +1.12x faster │
│ QQuery 3     │   623.71 ms │    617.23 ms │     no change │
│ QQuery 4     │  1285.31 ms │   1298.10 ms │     no change │
│ QQuery 5     │ 13863.88 ms │  14128.38 ms │     no change │
│ QQuery 6     │  2383.77 ms │   2341.30 ms │     no change │
│ QQuery 7     │  1951.39 ms │   1953.50 ms │     no change │
└──────────────┴─────────────┴──────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary           ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)           │ 24358.37ms │
│ Total Time (topk-filters)   │ 24280.57ms │
│ Average Time (HEAD)         │  3044.80ms │
│ Average Time (topk-filters) │  3035.07ms │
│ Queries Faster              │          2 │
│ Queries Slower              │          0 │
│ Queries with No Change      │          6 │
│ Queries with Failure        │          0 │
└─────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ topk-filters ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.30 ms │      2.32 ms │     no change │
│ QQuery 1     │    29.71 ms │     28.77 ms │     no change │
│ QQuery 2     │    77.53 ms │     76.14 ms │     no change │
│ QQuery 3     │    94.53 ms │     90.05 ms │     no change │
│ QQuery 4     │   597.57 ms │    609.85 ms │     no change │
│ QQuery 5     │   880.91 ms │    881.20 ms │     no change │
│ QQuery 6     │     2.22 ms │      2.35 ms │  1.05x slower │
│ QQuery 7     │    31.73 ms │     33.48 ms │  1.06x slower │
│ QQuery 8     │   861.09 ms │    883.18 ms │     no change │
│ QQuery 9     │  1167.22 ms │   1144.09 ms │     no change │
│ QQuery 10    │   233.16 ms │    227.48 ms │     no change │
│ QQuery 11    │   263.75 ms │    259.39 ms │     no change │
│ QQuery 12    │   865.53 ms │    865.13 ms │     no change │
│ QQuery 13    │  1180.94 ms │   1231.48 ms │     no change │
│ QQuery 14    │   809.24 ms │    809.07 ms │     no change │
│ QQuery 15    │   775.27 ms │    777.78 ms │     no change │
│ QQuery 16    │  1619.25 ms │   1639.14 ms │     no change │
│ QQuery 17    │  1608.47 ms │   1643.10 ms │     no change │
│ QQuery 18    │  2875.25 ms │   3002.07 ms │     no change │
│ QQuery 19    │    79.14 ms │     81.14 ms │     no change │
│ QQuery 20    │  1227.30 ms │   1234.70 ms │     no change │
│ QQuery 21    │  1377.70 ms │   1356.98 ms │     no change │
│ QQuery 22    │  2268.17 ms │   2238.49 ms │     no change │
│ QQuery 23    │  7690.27 ms │   6835.65 ms │ +1.13x faster │
│ QQuery 24    │   426.90 ms │    417.85 ms │     no change │
│ QQuery 25    │   296.39 ms │    294.12 ms │     no change │
│ QQuery 26    │   428.93 ms │    417.70 ms │     no change │
│ QQuery 27    │  1618.21 ms │   1573.12 ms │     no change │
│ QQuery 28    │ 12002.03 ms │  12016.65 ms │     no change │
│ QQuery 29    │   520.45 ms │    523.19 ms │     no change │
│ QQuery 30    │   774.86 ms │    761.23 ms │     no change │
│ QQuery 31    │   794.38 ms │    783.65 ms │     no change │
│ QQuery 32    │  2461.37 ms │   2452.40 ms │     no change │
│ QQuery 33    │  3218.56 ms │   3214.72 ms │     no change │
│ QQuery 34    │  3200.35 ms │   3223.54 ms │     no change │
│ QQuery 35    │  1267.63 ms │   1276.62 ms │     no change │
│ QQuery 36    │   125.57 ms │    121.93 ms │     no change │
│ QQuery 37    │    53.62 ms │     53.02 ms │     no change │
│ QQuery 38    │   127.07 ms │    124.19 ms │     no change │
│ QQuery 39    │   206.77 ms │    199.55 ms │     no change │
│ QQuery 40    │    41.71 ms │     41.96 ms │     no change │
│ QQuery 41    │    39.44 ms │     38.61 ms │     no change │
│ QQuery 42    │    32.99 ms │     32.51 ms │     no change │
└──────────────┴─────────────┴──────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary           ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)           │ 54255.46ms │
│ Total Time (topk-filters)   │ 53519.60ms │
│ Average Time (HEAD)         │  1261.75ms │
│ Average Time (topk-filters) │  1244.64ms │
│ Queries Faster              │          1 │
│ Queries Slower              │          2 │
│ Queries with No Change      │         40 │
│ Queries with Failure        │          0 │
└─────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ topk-filters ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │  98.49 ms │    101.87 ms │    no change │
│ QQuery 2     │  20.77 ms │     21.10 ms │    no change │
│ QQuery 3     │  32.36 ms │     32.66 ms │    no change │
│ QQuery 4     │  18.17 ms │     18.49 ms │    no change │
│ QQuery 5     │  50.23 ms │     50.16 ms │    no change │
│ QQuery 6     │  11.82 ms │     15.01 ms │ 1.27x slower │
│ QQuery 7     │  85.27 ms │     90.07 ms │ 1.06x slower │
│ QQuery 8     │  24.46 ms │     24.63 ms │    no change │
│ QQuery 9     │  53.23 ms │     54.00 ms │    no change │
│ QQuery 10    │  41.16 ms │     41.41 ms │    no change │
│ QQuery 11    │  38.92 ms │     39.16 ms │    no change │
│ QQuery 12    │  30.06 ms │     30.97 ms │    no change │
│ QQuery 13    │  25.64 ms │     26.23 ms │    no change │
│ QQuery 14    │   9.69 ms │     10.09 ms │    no change │
│ QQuery 15    │  19.14 ms │     19.15 ms │    no change │
│ QQuery 16    │  17.42 ms │     17.41 ms │    no change │
│ QQuery 17    │  93.98 ms │     94.65 ms │    no change │
│ QQuery 18    │ 183.23 ms │    185.54 ms │    no change │
│ QQuery 19    │  23.37 ms │     23.42 ms │    no change │
│ QQuery 20    │  30.99 ms │     31.68 ms │    no change │
│ QQuery 21    │ 139.07 ms │    141.79 ms │    no change │
│ QQuery 22    │  14.03 ms │     14.21 ms │    no change │
└──────────────┴───────────┴──────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary           ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)           │ 1061.48ms │
│ Total Time (topk-filters)   │ 1083.67ms │
│ Average Time (HEAD)         │   48.25ms │
│ Average Time (topk-filters) │   49.26ms │
│ Queries Faster              │         0 │
│ Queries Slower              │         2 │
│ Queries with No Change      │        20 │
│ Queries with Failure        │         0 │
└─────────────────────────────┴───────────┘

alamb avatar Aug 18 '25 18:08 alamb

QQuery 23 │ 7751.39 ms │ 6855.54 ms │ +1.13x faster QQuery 23 │ 7690.27 ms │ 6835.65 ms │ +1.13x faster

I'm guessing this is mostly real 😄

adriangb avatar Aug 18 '25 18:08 adriangb

🤖 ./gh_compare_branch.sh Benchmark Script Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing topk-filters (3945aba663062debc24757c857a8bd679ad8b223) to 8f15991f33bf6aca9d4da8958141b59d196b2ed6 diff using: topk_tpch Results will be posted here when complete

alamb avatar Aug 19 '25 17:08 alamb

Given the nice performance improvements this seems to provide, I plan to review this PR carefully in the next day or two

alamb avatar Aug 19 '25 18:08 alamb

🤖: Benchmark completed

Details

Comparing HEAD and topk-filters
--------------------
Benchmark run_topk_tpch.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ topk-filters ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1           │  24.21 ms │      7.05 ms │ +3.43x faster │
│ Q2           │  31.06 ms │     30.96 ms │     no change │
│ Q3           │ 100.83 ms │     92.74 ms │ +1.09x faster │
│ Q4           │  40.93 ms │     33.97 ms │ +1.20x faster │
│ Q5           │  23.65 ms │     27.52 ms │  1.16x slower │
│ Q6           │  45.75 ms │     49.21 ms │  1.08x slower │
│ Q7           │ 119.32 ms │    125.34 ms │  1.05x slower │
│ Q8           │  93.13 ms │     71.32 ms │ +1.31x faster │
│ Q9           │  94.83 ms │     92.51 ms │     no change │
│ Q10          │ 166.33 ms │    160.07 ms │     no change │
│ Q11          │  80.40 ms │     11.82 ms │ +6.80x faster │
└──────────────┴───────────┴──────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary           ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)           │ 820.45ms │
│ Total Time (topk-filters)   │ 702.52ms │
│ Average Time (HEAD)         │  74.59ms │
│ Average Time (topk-filters) │  63.87ms │
│ Queries Faster              │        5 │
│ Queries Slower              │        3 │
│ Queries with No Change      │        3 │
│ Queries with Failure        │        0 │
└─────────────────────────────┴──────────┘

alamb avatar Aug 19 '25 18:08 alamb

Q11 │ 80.40 ms │ 11.82 ms │ +6.80x faster

image

adriangb avatar Aug 19 '25 18:08 adriangb

🌶️ 🌶️ 🌶️

alamb avatar Aug 19 '25 18:08 alamb

🤖 ./gh_compare_branch.sh Benchmark Script Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing topk-filters (ffe68565b55ba640f9c5197a4f38106ab0460807) to 8f15991f33bf6aca9d4da8958141b59d196b2ed6 diff using: tpch_mem Results will be posted here when complete

alamb avatar Aug 19 '25 22:08 alamb