horaedb icon indicating copy to clipboard operation
horaedb copied to clipboard

Support complex filter before merge procedure

Open ShiKaiWi opened this issue 3 years ago • 1 comments

Describe This Problem

A filter procedure according to the query predicates will be applied to the record batch stream from sst before feeding the batches to the merge iterator. However, the filter only supports a very simple form -- anded binary expression, so it doesn't work if the query predicate is complex, e.g. where (hostname = '127.0.0.1' or hostname = '192.168.0.2') and timestamp between 'xxxx' and 'xxxx'.

Proposal

The crucial point here is how to make the filter procedure can support complex predicate expressions, and basically there are two approaches to this target:

  • Utilize datafusion;
  • Implement the filter logic manually;

And I vote for the first approach, but we have to figure out how to utilize datafusion to implement the filter logic.

Additional Context

No response

ShiKaiWi avatar Sep 15 '22 08:09 ShiKaiWi

TSBS is added to CI, we can use it to compare performance before/after fix this issue

  • https://github.com/CeresDB/ceresdb/actions/runs/3102504402#summary-8485564354

jiacai2050 avatar Sep 22 '22 03:09 jiacai2050

To utilize datafusion, we can do:

  • Create PhysicalExpr from LogicalExpr via create_physical_expr.
  • Implement filter logic like FilterExecStream do in datafusion.

create_physical_expr: https://github.com/apache/arrow-datafusion/blob/45fc415daa7028559ef3477e53a184a114149f9e/datafusion/physical-expr/src/planner.rs#L42

FilterExecStream: https://github.com/apache/arrow-datafusion/blob/45fc415daa7028559ef3477e53a184a114149f9e/datafusion/core/src/physical_plan/filter.rs#L180

Maybe I can help do this task :D.

ygf11 avatar Oct 05 '22 03:10 ygf11

It will be appreciated if you volunteer to help.

ShiKaiWi avatar Oct 06 '22 00:10 ShiKaiWi

@ygf11 I have updated the code location about the filtering procedure, and I hope it will help: https://github.com/CeresDB/ceresdb/blob/43a84ba3c2ddcee69906e70322060b6dc4e91ddc/analytic_engine/src/row_iter/record_batch_stream.rs#L137

ShiKaiWi avatar Oct 08 '22 07:10 ShiKaiWi

I have updated the code location about the filtering procedure, and I hopes it will help.

Thanks for reminding, it helps a lot.

ygf11 avatar Oct 08 '22 09:10 ygf11