lance icon indicating copy to clipboard operation
lance copied to clipboard

Index out of bounds using lance extension in duckdb v0.7.0

Open jalateras opened this issue 2 years ago • 3 comments

I build the lance duckdb extension using the version 2972ae209fd159b6ff15266d0a457f144029aa60. I can load the extension in duckdb v0.7.0

RUST_BACKTRACE=1 duckdb --unsigned
v0.7.0 f7827396d7
Enter ".help" for usage hints.
D load lance;

I then download the vec_data.lance dataset from s3://eto-public/datasets/sift/vec_data.lance/ bur when i execute the following select

D select count(*) from lance_scan('vec_data.lance');

i get this exception

thread '<unnamed>' panicked at 'index out of bounds: the len is 2 but the index is 18446744073709551615', src/scan.rs:110:24
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panicking.rs:65:14
   2: core::panicking::panic_bounds_check
             at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panicking.rs:151:5
   3: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
   4: _read_lance_init
   5: __ZN6duckdb18CTableFunctionInitERNS_13ClientContextERNS_22TableFunctionInitInputE
   6: __ZN6duckdb26TableScanGlobalSourceStateC2ERNS_13ClientContextERKNS_17PhysicalTableScanE
   7: __ZNK6duckdb17PhysicalTableScan20GetGlobalSourceStateERNS_13ClientContextE
   8: __ZN6duckdb8Executor16SchedulePipelineERKNSt3__110shared_ptrINS_12MetaPipelineEEERNS_17ScheduleEventDataE
   9: __ZN6duckdb8Executor22ScheduleEventsInternalERNS_17ScheduleEventDataE
  10: __ZN6duckdb8Executor14ScheduleEventsERKNSt3__16vectorINS1_10shared_ptrINS_12MetaPipelineEEENS1_9allocatorIS5_EEEE
  11: __ZN6duckdb8Executor18InitializeInternalEPNS_16PhysicalOperatorE
  12: 
  13: __ZN6duckdb13ClientContext35PendingStatementOrPreparedStatementERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEENS3_10unique_ptrINS_12SQLStatementENS3_14default_deleteISD_EEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  14: __ZN6duckdb13ClientContext43PendingStatementOrPreparedStatementInternalERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEENS3_10unique_ptrINS_12SQLStatementENS3_14default_deleteISD_EEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  15: __ZN6duckdb13ClientContext28PendingQueryPreparedInternalERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  16: __ZN6duckdb13ClientContext12PendingQueryERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEERNS1_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  17: __ZN6duckdb17PreparedStatement12PendingQueryERNSt3__16vectorINS_5ValueENS1_9allocatorIS3_EEEEb
  18: __ZN6duckdb17PreparedStatement7ExecuteERNSt3__16vectorINS_5ValueENS1_9allocatorIS3_EEEEb
  19: _duckdb_shell_sqlite3_print_duckbox
  20: _exec_prepared_stmt
  21: _shell_exec
  22: _runOneSqlLine
  23: _process_input
  24: _main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
fatal runtime error: failed to initiate panic, error 5
Abort trap: 6

jalateras avatar May 05 '23 01:05 jalateras

I tried the same with a smaller dataset created with Pandas

import pandas as pd
import lance

df = pd.DataFrame([['Ajitesh', 84, 183, 'no'],
                   ['Shailesh', 79, 186, 'yes'],
                   ['Seema', 67, 158, 'yes'],
                   ['Nidhi', 52, 155, 'no']])
df.columns = ['name', 'weight', 'height', 'smoker']
lance.write_dataset(df, '/tmp/small.lance')

and when i execute the following

duckdb --unsigned
v0.7.0 f7827396d7
Enter ".help" for usage hints.
D load lance;
D select count(*) from lance_scan('small.lance');

i get the following exception

thread '<unnamed>' panicked at 'index out of bounds: the len is 4 but the index is 18446744073709551615', src/scan.rs:110:24
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panicking.rs:65:14
   2: core::panicking::panic_bounds_check
             at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panicking.rs:151:5
   3: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
   4: _read_lance_init
   5: __ZN6duckdb18CTableFunctionInitERNS_13ClientContextERNS_22TableFunctionInitInputE
   6: __ZN6duckdb26TableScanGlobalSourceStateC2ERNS_13ClientContextERKNS_17PhysicalTableScanE
   7: __ZNK6duckdb17PhysicalTableScan20GetGlobalSourceStateERNS_13ClientContextE
   8: __ZN6duckdb8Executor16SchedulePipelineERKNSt3__110shared_ptrINS_12MetaPipelineEEERNS_17ScheduleEventDataE
   9: __ZN6duckdb8Executor22ScheduleEventsInternalERNS_17ScheduleEventDataE
  10: __ZN6duckdb8Executor14ScheduleEventsERKNSt3__16vectorINS1_10shared_ptrINS_12MetaPipelineEEENS1_9allocatorIS5_EEEE
  11: __ZN6duckdb8Executor18InitializeInternalEPNS_16PhysicalOperatorE
  12: __ZN6duckdb13ClientContext24PendingPreparedStatementERNS_17ClientContextLockENSt3__110shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  13: __ZN6duckdb13ClientContext35PendingStatementOrPreparedStatementERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEENS3_10unique_ptrINS_12SQLStatementENS3_14default_deleteISD_EEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  14: __ZN6duckdb13ClientContext43PendingStatementOrPreparedStatementInternalERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEENS3_10unique_ptrINS_12SQLStatementENS3_14default_deleteISD_EEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  15: __ZN6duckdb13ClientContext28PendingQueryPreparedInternalERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  16: __ZN6duckdb13ClientContext12PendingQueryERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEERNS1_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  17: __ZN6duckdb17PreparedStatement12PendingQueryERNSt3__16vectorINS_5ValueENS1_9allocatorIS3_EEEEb
  18: __ZN6duckdb17PreparedStatement7ExecuteERNSt3__16vectorINS_5ValueENS1_9allocatorIS3_EEEEb
  19: _duckdb_shell_sqlite3_print_duckbox
  20: _exec_prepared_stmt
  21: _shell_exec
  22: _runOneSqlLine
  23: _process_input
  24: _main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
fatal runtime error: failed to initiate panic, error 5
Abort trap: 6

and here is the full trace

thread '<unnamed>' panicked at 'index out of bounds: the len is 4 but the index is 18446744073709551615', src/scan.rs:110:24
stack backtrace:
   0:        0x118744912 - std::backtrace_rs::backtrace::libunwind::trace::hf6d6e64f9b264809
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:        0x118744912 - std::backtrace_rs::backtrace::trace_unsynchronized::h83629c2e54dbbc12
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:        0x118744912 - std::sys_common::backtrace::_print_fmt::h40995e5769fa5524
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys_common/backtrace.rs:65:5
   3:        0x118744912 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h8d94e552d95b28cc
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys_common/backtrace.rs:44:22
   4:        0x118767f9a - core::fmt::write::h421d4212716e9716
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/fmt/mod.rs:1209:17
   5:        0x11873e4bc - std::io::Write::write_fmt::hdc28b71c2d62dad8
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/io/mod.rs:1682:15
   6:        0x1187446da - std::sys_common::backtrace::_print::habfe2bb38db219c3
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys_common/backtrace.rs:47:5
   7:        0x1187446da - std::sys_common::backtrace::print::he11eab6b959c3b5b
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys_common/backtrace.rs:34:9
   8:        0x118746446 - std::panicking::default_hook::{{closure}}::ha68ba8cbe26bbbe3
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:267:22
   9:        0x118746197 - std::panicking::default_hook::h5cf85224a4df5bc6
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:286:9
  10:        0x118746b8d - std::panicking::rust_panic_with_hook::hed342721bf9addfa
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:688:13
  11:        0x118746943 - std::panicking::begin_panic_handler::{{closure}}::h3d9af89e51f2fba9
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:579:13
  12:        0x118744da8 - std::sys_common::backtrace::__rust_end_short_backtrace::hfb9719355016e93f
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys_common/backtrace.rs:137:18
  13:        0x11874660d - rust_begin_unwind
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:575:5
  14:        0x1188d6103 - core::panicking::panic_fmt::h1965fc2159be50bb
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panicking.rs:65:14
  15:        0x1188d6246 - core::panicking::panic_bounds_check::h503aa148bf97089f
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panicking.rs:151:5
  16:        0x11688216a - <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter::h3d52f90a316a19bb
  17:        0x1168702d0 - _read_lance_init
  18:        0x11671565e - __ZN6duckdb18CTableFunctionInitERNS_13ClientContextERNS_22TableFunctionInitInputE
  19:        0x104054aa9 - __ZN6duckdb26TableScanGlobalSourceStateC2ERNS_13ClientContextERKNS_17PhysicalTableScanE
  20:        0x104052fcf - __ZNK6duckdb17PhysicalTableScan20GetGlobalSourceStateERNS_13ClientContextE
  21:        0x104176f4f - __ZN6duckdb8Executor16SchedulePipelineERKNSt3__110shared_ptrINS_12MetaPipelineEEERNS_17ScheduleEventDataE
  22:        0x10417828b - __ZN6duckdb8Executor22ScheduleEventsInternalERNS_17ScheduleEventDataE
  23:        0x1041783f9 - __ZN6duckdb8Executor14ScheduleEventsERKNSt3__16vectorINS1_10shared_ptrINS_12MetaPipelineEEENS1_9allocatorIS5_EEEE
  24:        0x1041793ea - __ZN6duckdb8Executor18InitializeInternalEPNS_16PhysicalOperatorE
  25:        0x1040e5fe7 - __ZN6duckdb13ClientContext24PendingPreparedStatementERNS_17ClientContextLockENSt3__110shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  26:        0x1040eb651 - __ZN6duckdb13ClientContext35PendingStatementOrPreparedStatementERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEENS3_10unique_ptrINS_12SQLStatementENS3_14default_deleteISD_EEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  27:        0x1040e9084 - __ZN6duckdb13ClientContext43PendingStatementOrPreparedStatementInternalERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEENS3_10unique_ptrINS_12SQLStatementENS3_14default_deleteISD_EEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  28:        0x1040e8bed - __ZN6duckdb13ClientContext28PendingQueryPreparedInternalERNS_17ClientContextLockERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEERNS3_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  29:        0x1040e979d - __ZN6duckdb13ClientContext12PendingQueryERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEERNS1_10shared_ptrINS_21PreparedStatementDataEEENS_22PendingQueryParametersE
  30:        0x1040ff998 - __ZN6duckdb17PreparedStatement12PendingQueryERNSt3__16vectorINS_5ValueENS1_9allocatorIS3_EEEEb
  31:        0x1040f5499 - __ZN6duckdb17PreparedStatement7ExecuteERNSt3__16vectorINS_5ValueENS1_9allocatorIS3_EEEEb
  32:        0x10302cbe9 - _duckdb_shell_sqlite3_print_duckbox
  33:        0x10301ab81 - _exec_prepared_stmt
  34:        0x10300d2bd - _shell_exec
  35:        0x10301c5bd - _runOneSqlLine
  36:        0x10300e2b1 - _process_input
  37:        0x1030015bf - _main
fatal runtime error: failed to initiate panic, error 5
Abort trap: 6

jalateras avatar May 05 '23 01:05 jalateras

@changhiskhan any progress on this

jalateras avatar May 12 '23 07:05 jalateras

Ok, i can reproduce this, seems only happen on count(), but not SELECT * FROM lance_scan(). Lemme look into it

Update:

So this query works SELECT COUNT(name) FROM lance_scan("/tmp/small.lance") but SELECT COUNT(*) FROM lance_scan("/tmp/small.lance").

It seems that here

https://github.com/eto-ai/lance/blob/5c370e9220b8b97e7b873497397ff7412adf7d98/integration/duckdb_lance/src/scan.rs#L107

projected_column_id() returns U64::Max = 18446744073709551615 here.

Fix:

When the projected column id is u64::Max, instead pick any (preferably smallest) column instead

eddyxu avatar May 18 '23 23:05 eddyxu