Index out of bounds using lance extension in duckdb v0.7.0
I build the lance duckdb extension using the version 2972ae209fd159b6ff15266d0a457f144029aa60. I can load the extension in duckdb v0.7.0
RUST_BACKTRACE=1 duckdb --unsigned
v0.7.0 f7827396d7
Enter ".help" for usage hints.
D load lance;
I then download the vec_data.lance dataset from s3://eto-public/datasets/sift/vec_data.lance/ bur when i execute the following select
D select count(*) from lance_scan('vec_data.lance');
i get this exception
thread '<unnamed>' panicked at 'index out of bounds: the len is 2 but the index is 18446744073709551615', src/scan.rs:110:24
stack backtrace:
0: rust_begin_unwind
at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:575:5
1: core::panicking::panic_fmt
at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panicking.rs:65:14
2: core::panicking::panic_bounds_check
at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panicking.rs:151:5
3: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
4: _read_lance_init
5: __ZN6duckdb18CTableFunctionInitERNS_13ClientContextERNS_22TableFunctionInitInputE
6: __ZN6duckdb26TableScanGlobalSourceStateC2ERNS_13ClientContextERKNS_17PhysicalTableScanE
7: __ZNK6duckdb17PhysicalTableScan20GetGlobalSourceStateERNS_13ClientContextE
8: __ZN6duckdb8Executor16SchedulePipelineERKNSt3__110shared_ptrINS_12MetaPipelineEEERNS_17ScheduleEventDataE
9: __ZN6duckdb8Executor22ScheduleEventsInternalERNS_17ScheduleEventDataE
10: __ZN6duckdb8Executor14ScheduleEventsERKNSt3__16vectorINS1_10shared_ptrINS_12MetaPipelineEEENS1_9allocatorIS5_EEEE
11: __ZN6duckdb8Executor18InitializeInternalEPNS_16PhysicalOperatorE
12:
13: __ZN6duckdb13ClientContext35PendingStatementOrPreparedStatementERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEENS3_10unique_ptrINS_12SQLStatementENS3_14default_deleteISD_EEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
14: __ZN6duckdb13ClientContext43PendingStatementOrPreparedStatementInternalERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEENS3_10unique_ptrINS_12SQLStatementENS3_14default_deleteISD_EEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
15: __ZN6duckdb13ClientContext28PendingQueryPreparedInternalERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
16: __ZN6duckdb13ClientContext12PendingQueryERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEERNS1_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
17: __ZN6duckdb17PreparedStatement12PendingQueryERNSt3__16vectorINS_5ValueENS1_9allocatorIS3_EEEEb
18: __ZN6duckdb17PreparedStatement7ExecuteERNSt3__16vectorINS_5ValueENS1_9allocatorIS3_EEEEb
19: _duckdb_shell_sqlite3_print_duckbox
20: _exec_prepared_stmt
21: _shell_exec
22: _runOneSqlLine
23: _process_input
24: _main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
fatal runtime error: failed to initiate panic, error 5
Abort trap: 6
I tried the same with a smaller dataset created with Pandas
import pandas as pd
import lance
df = pd.DataFrame([['Ajitesh', 84, 183, 'no'],
['Shailesh', 79, 186, 'yes'],
['Seema', 67, 158, 'yes'],
['Nidhi', 52, 155, 'no']])
df.columns = ['name', 'weight', 'height', 'smoker']
lance.write_dataset(df, '/tmp/small.lance')
and when i execute the following
duckdb --unsigned
v0.7.0 f7827396d7
Enter ".help" for usage hints.
D load lance;
D select count(*) from lance_scan('small.lance');
i get the following exception
thread '<unnamed>' panicked at 'index out of bounds: the len is 4 but the index is 18446744073709551615', src/scan.rs:110:24
stack backtrace:
0: rust_begin_unwind
at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:575:5
1: core::panicking::panic_fmt
at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panicking.rs:65:14
2: core::panicking::panic_bounds_check
at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panicking.rs:151:5
3: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
4: _read_lance_init
5: __ZN6duckdb18CTableFunctionInitERNS_13ClientContextERNS_22TableFunctionInitInputE
6: __ZN6duckdb26TableScanGlobalSourceStateC2ERNS_13ClientContextERKNS_17PhysicalTableScanE
7: __ZNK6duckdb17PhysicalTableScan20GetGlobalSourceStateERNS_13ClientContextE
8: __ZN6duckdb8Executor16SchedulePipelineERKNSt3__110shared_ptrINS_12MetaPipelineEEERNS_17ScheduleEventDataE
9: __ZN6duckdb8Executor22ScheduleEventsInternalERNS_17ScheduleEventDataE
10: __ZN6duckdb8Executor14ScheduleEventsERKNSt3__16vectorINS1_10shared_ptrINS_12MetaPipelineEEENS1_9allocatorIS5_EEEE
11: __ZN6duckdb8Executor18InitializeInternalEPNS_16PhysicalOperatorE
12: __ZN6duckdb13ClientContext24PendingPreparedStatementERNS_17ClientContextLockENSt3__110shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
13: __ZN6duckdb13ClientContext35PendingStatementOrPreparedStatementERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEENS3_10unique_ptrINS_12SQLStatementENS3_14default_deleteISD_EEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
14: __ZN6duckdb13ClientContext43PendingStatementOrPreparedStatementInternalERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEENS3_10unique_ptrINS_12SQLStatementENS3_14default_deleteISD_EEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
15: __ZN6duckdb13ClientContext28PendingQueryPreparedInternalERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
16: __ZN6duckdb13ClientContext12PendingQueryERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEERNS1_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
17: __ZN6duckdb17PreparedStatement12PendingQueryERNSt3__16vectorINS_5ValueENS1_9allocatorIS3_EEEEb
18: __ZN6duckdb17PreparedStatement7ExecuteERNSt3__16vectorINS_5ValueENS1_9allocatorIS3_EEEEb
19: _duckdb_shell_sqlite3_print_duckbox
20: _exec_prepared_stmt
21: _shell_exec
22: _runOneSqlLine
23: _process_input
24: _main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
fatal runtime error: failed to initiate panic, error 5
Abort trap: 6
and here is the full trace
thread '<unnamed>' panicked at 'index out of bounds: the len is 4 but the index is 18446744073709551615', src/scan.rs:110:24
stack backtrace:
0: 0x118744912 - std::backtrace_rs::backtrace::libunwind::trace::hf6d6e64f9b264809
at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
1: 0x118744912 - std::backtrace_rs::backtrace::trace_unsynchronized::h83629c2e54dbbc12
at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
2: 0x118744912 - std::sys_common::backtrace::_print_fmt::h40995e5769fa5524
at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys_common/backtrace.rs:65:5
3: 0x118744912 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h8d94e552d95b28cc
at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys_common/backtrace.rs:44:22
4: 0x118767f9a - core::fmt::write::h421d4212716e9716
at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/fmt/mod.rs:1209:17
5: 0x11873e4bc - std::io::Write::write_fmt::hdc28b71c2d62dad8
at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/io/mod.rs:1682:15
6: 0x1187446da - std::sys_common::backtrace::_print::habfe2bb38db219c3
at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys_common/backtrace.rs:47:5
7: 0x1187446da - std::sys_common::backtrace::print::he11eab6b959c3b5b
at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys_common/backtrace.rs:34:9
8: 0x118746446 - std::panicking::default_hook::{{closure}}::ha68ba8cbe26bbbe3
at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:267:22
9: 0x118746197 - std::panicking::default_hook::h5cf85224a4df5bc6
at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:286:9
10: 0x118746b8d - std::panicking::rust_panic_with_hook::hed342721bf9addfa
at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:688:13
11: 0x118746943 - std::panicking::begin_panic_handler::{{closure}}::h3d9af89e51f2fba9
at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:579:13
12: 0x118744da8 - std::sys_common::backtrace::__rust_end_short_backtrace::hfb9719355016e93f
at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys_common/backtrace.rs:137:18
13: 0x11874660d - rust_begin_unwind
at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:575:5
14: 0x1188d6103 - core::panicking::panic_fmt::h1965fc2159be50bb
at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panicking.rs:65:14
15: 0x1188d6246 - core::panicking::panic_bounds_check::h503aa148bf97089f
at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panicking.rs:151:5
16: 0x11688216a - <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter::h3d52f90a316a19bb
17: 0x1168702d0 - _read_lance_init
18: 0x11671565e - __ZN6duckdb18CTableFunctionInitERNS_13ClientContextERNS_22TableFunctionInitInputE
19: 0x104054aa9 - __ZN6duckdb26TableScanGlobalSourceStateC2ERNS_13ClientContextERKNS_17PhysicalTableScanE
20: 0x104052fcf - __ZNK6duckdb17PhysicalTableScan20GetGlobalSourceStateERNS_13ClientContextE
21: 0x104176f4f - __ZN6duckdb8Executor16SchedulePipelineERKNSt3__110shared_ptrINS_12MetaPipelineEEERNS_17ScheduleEventDataE
22: 0x10417828b - __ZN6duckdb8Executor22ScheduleEventsInternalERNS_17ScheduleEventDataE
23: 0x1041783f9 - __ZN6duckdb8Executor14ScheduleEventsERKNSt3__16vectorINS1_10shared_ptrINS_12MetaPipelineEEENS1_9allocatorIS5_EEEE
24: 0x1041793ea - __ZN6duckdb8Executor18InitializeInternalEPNS_16PhysicalOperatorE
25: 0x1040e5fe7 - __ZN6duckdb13ClientContext24PendingPreparedStatementERNS_17ClientContextLockENSt3__110shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
26: 0x1040eb651 - __ZN6duckdb13ClientContext35PendingStatementOrPreparedStatementERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEENS3_10unique_ptrINS_12SQLStatementENS3_14default_deleteISD_EEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
27: 0x1040e9084 - __ZN6duckdb13ClientContext43PendingStatementOrPreparedStatementInternalERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEENS3_10unique_ptrINS_12SQLStatementENS3_14default_deleteISD_EEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
28: 0x1040e8bed - __ZN6duckdb13ClientContext28PendingQueryPreparedInternalERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
29: 0x1040e979d - __ZN6duckdb13ClientContext12PendingQueryERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEERNS1_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
30: 0x1040ff998 - __ZN6duckdb17PreparedStatement12PendingQueryERNSt3__16vectorINS_5ValueENS1_9allocatorIS3_EEEEb
31: 0x1040f5499 - __ZN6duckdb17PreparedStatement7ExecuteERNSt3__16vectorINS_5ValueENS1_9allocatorIS3_EEEEb
32: 0x10302cbe9 - _duckdb_shell_sqlite3_print_duckbox
33: 0x10301ab81 - _exec_prepared_stmt
34: 0x10300d2bd - _shell_exec
35: 0x10301c5bd - _runOneSqlLine
36: 0x10300e2b1 - _process_input
37: 0x1030015bf - _main
fatal runtime error: failed to initiate panic, error 5
Abort trap: 6
@changhiskhan any progress on this
Ok, i can reproduce this, seems only happen on count(), but not SELECT * FROM lance_scan(). Lemme look into it
Update:
So this query works SELECT COUNT(name) FROM lance_scan("/tmp/small.lance") but SELECT COUNT(*) FROM lance_scan("/tmp/small.lance").
It seems that here
https://github.com/eto-ai/lance/blob/5c370e9220b8b97e7b873497397ff7412adf7d98/integration/duckdb_lance/src/scan.rs#L107
projected_column_id() returns U64::Max = 18446744073709551615 here.
Fix:
When the projected column id is u64::Max, instead pick any (preferably smallest) column instead