Artem Kroviakov
Artem Kroviakov
To see the benefit and compare to the current multifrag, one must be able to (1) linearize on GPU one-by-one and (2) modify some kernel (I'm thinking of HT builder)...
Quick data on a simple group by 10'000'000 rows: ``` Current multifrag, 2nd run, 20 frags: 77ms start(16ms) executePlan Execute.cpp:3540 Current one frag, 2nd run: 76ms start(13ms) executePlan Execute.cpp:3576 ```...
> As an aside, linearization cost is very high for varlen data. In relation to the idea: if it has to be sent to GPU anyways and we know the...
I agree that having a linear buffer for a column on GPU, where we place loaded fragments one after another doesn't make a lot of sense in case there is...
> if size(red_dims) > 1, rewrite the reduction into multiple reductions where each reduction is over single dims > if size(red_dims) == 1, apply the currently implemented logic Yes, this...
> why do we need to rewrite it as multiple reduction instructions doing one reduce dim at at time? Wg to sg can do multi-dim reduce locally first and then...