Artem Kroviakov comments

Results 6 comments of


                                            Artem Kroviakov

device-linear-multifrag execution mode

To see the benefit and compare to the current multifrag, one must be able to (1) linearize on GPU one-by-one and (2) modify some kernel (I'm thinking of HT builder)...

device-linear-multifrag execution mode

Quick data on a simple group by 10'000'000 rows: ``` Current multifrag, 2nd run, 20 frags: 77ms start(16ms) executePlan Execute.cpp:3540 Current one frag, 2nd run: 76ms start(13ms) executePlan Execute.cpp:3576 ```...

device-linear-multifrag execution mode

> As an aside, linearization cost is very high for varlen data. In relation to the idea: if it has to be sent to GPU anyways and we know the...

device-linear-multifrag execution mode

I agree that having a linear buffer for a column on GPU, where we place loaded fragments one after another doesn't make a lot of sense in case there is...

[MLIR][XeGPU] Add support for cross-subgroup reduction from wg to sg

> if size(red_dims) > 1, rewrite the reduction into multiple reductions where each reduction is over single dims > if size(red_dims) == 1, apply the currently implemented logic Yes, this...

[MLIR][XeGPU] Add support for cross-subgroup reduction from wg to sg

> why do we need to rewrite it as multiple reduction instructions doing one reduce dim at at time? Wg to sg can do multi-dim reduce locally first and then...