Yongting You

Results 35 comments of Yongting You

According to the previous prototyping in https://github.com/apache/arrow-datafusion/pull/7978, we might need to do several cleanups towards this issue. ### Following 2 is necessary for separating `encode()/decode()` and make API change (in...

> @2010YOUY01 I have updated the task list on this ticket based on your investigation. Please take a look when you have a chance. > > The only one I...

> Thanks for driving this forward @2010YOUY01 -- it is very much appreciated. > > I am planning to merge #8079 in 4 more days (after it has been open...

> Thank you @2010YOUY01 . This PR, as all your others, is well written, documented and tested and is easy to read and understand. Thank you so much. > >...

> # Does this belong in Datafusion core? Or does it belong as an add on? > With this level of specialization required, I wonder where shall we stop adding...

https://github.com/apache/arrow-datafusion/pull/7376 did several smart optimizations for `median()` For example a O(n) quick select in the final evaluate step for aggregation For `select median(l_partkey) from lineitem` using sf10 parquet TPCH data:...

Really impressive work! 1. I suggest opening another PR for benchmarks only, it can get merged easily and also help attract more attention. 2. I have a question: (just skimmed...

Thanks for working on it. `3.i.` is definitely not efficient for memory usage. 🤦🏼 Perhaps limiting the the intermediate result to ~1 batch size is enough to keep the performance....

> > limiting the the intermediate result to ~1 batch size is enough to keep the performance. > > Do you mean we should also limit num_row of [`left_side, right_side`](https://github.com/apache/datafusion/blob/69dfe6c499d39563f4e6d9835fcdf3793f7d98c8/datafusion/physical-plan/src/joins/nested_loop_join.rs#L986)...

Thank you. I'm wondering what's the reference system for this function's behavior (like postgres or others)