`analysis.rs` bounds check panic
Describe the bug
Me again..
I'm getting a panic from
https://github.com/apache/datafusion/blob/cafbc9ddceb5af8c6408d0c8bbfed7568f655ddb/datafusion/physical-expr/src/analysis.rs#L95
I'm not exactly sure yet what's causing it, I'll give more details when I get them.
I must say, I'm bit surprised there's code that's so likely to panic in a situation where:
- returning an error is easy
- there's not guarantee from nearby code that
indexshould be in bounds in all common scenarios.
backtrace is:
0: _rust_begin_unwind
1: core::panicking::panic_fmt
2: core::panicking::panic_bounds_check
3: datafusion_physical_expr::analysis::ExprBoundaries::try_from_column
4: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
5: datafusion_physical_expr::analysis::AnalysisContext::try_from_statistics
6: datafusion_physical_plan::filter::FilterExec::statistics_helper
7: datafusion_physical_plan::filter::FilterExec::try_new
8: datafusion::physical_planner::DefaultPhysicalPlanner::map_logical_node_to_physical::{{closure}}
9: <futures_util::stream::futures_unordered::FuturesUnordered<Fut> as futures_core::stream::Stream>::poll_next
10: <S as futures_core::stream::TryStream>::try_poll_next
11: <futures_util::stream::try_stream::try_collect::TryCollect<St,C> as core::future::future::Future>::poll
12: datafusion::physical_planner::DefaultPhysicalPlanner::create_initial_plan::{{closure}}
13: <datafusion::physical_planner::DefaultPhysicalPlanner as datafusion::physical_planner::PhysicalPlanner>::create_physical_plan::{{closure}}
14: <datafusion::execution::context::DefaultQueryPlanner as datafusion::execution::context::QueryPlanner>::create_physical_plan::{{closure}}
15: datafusion::dataframe::DataFrame::create_physical_plan::{{closure}}
16: datafusion::dataframe::DataFrame::collect::{{closure}}
To Reproduce
no public code right now, but this is somewhat related to my work on #10400, although is specialised enough that my code wouldn't be solution to that.
Expected behavior
Either my stats are invalid in which case there should be an error, or my stats are okay and it should pass
Additional context
No response
I think the analysis code is fairly unused at this point. Getting it more ship shape will be great.
(Thank you so much for filing these issues)
Hi, I just stumbled on this issue, and from what I'm getting, we can return an error in this case since panicking is maybe unwanted. A way to do this:
pub fn try_from_column(
schema: &Schema,
col_stats: &ColumnStatistics,
col_index: usize,
) -> Result<Self> {
let field = schema.fields()
.get(col_index)
.ok_or_else(|| DataFusionError::Internal(
format!("Could not create `ExprBoundaries` from column index {col_index} in ::try_from_column(): Column index out of bounds, schema has {} columns.\n", schema.fields.len())
))?;
...
}
This function only get's called from https://github.com/apache/datafusion/blob/e1cfb48215ee91a183e06cfee602e42d2c23f429/datafusion/physical-expr/src/analysis.rs#L59C1-L70C6 Which indicates there are more column stats than columns in the schema.
I can't say more since I can't reproduce the error, so I'm sorry for any incompleteness.
I agree that changing to return an internal error sounds like a much better plan
take