wiedld

Results 18 comments of wiedld

For any questions on the big picture design, referred to the [diagrams here](https://github.com/wiedld/arrow-datafusion/pull/1). Note that^^ is a drafted followup PR. I've tried to incorporate some of the naming/wordage changes suggested,...

Run with the application of this slicing code (a single composable merge node) [branch here](https://github.com/wiedld/arrow-datafusion/pull/1). gcp c3d-standard-8-lssd debian-11 There are further confirmation steps, as well as hypotheses, as to why...

I think we should close this @alamb , since it's not a priority at the moment. And whenever we circle back, there will be a very large diff (due to...

Hypothetically, this could be a bad payload from the UI side. At least ruling out that option with some payload validation (see PR linked above).

Errors reoccurred. Is not due to the payload, is a runtime borrow bug -- which is rather difficult to chase down without more data. Leaving open for now. The errors...

> But I think @wiedld said she didn't have good luck with it so your mileage may vary While using the xcode allocations tool, I was getting In general I...

Ah, I forgot to mention a key point. When extracting data via heaptrack_print, I was looking at memory peaks and hence used `--flamegraph-cost-type peak`. You may want to check [other...

> I wonder if this could be related to DataFusion overriding the data_page_row_limit setting in https://github.com/apache/datafusion/issues/11367 (that @wiedld is working on) @alamb is mentioning the `data_page_row_limit` since in our own...

> I also found https://github.com/apache/arrow-rs/issues/5828 which might be related and/or relevant. @hveiga is correct that this is one suspected place with extra memory usage (specifically in the dict_encoder) when processing...

Have an alternative solution, done in the process of fixing https://github.com/apache/datafusion/issues/12119. PR up shortly.