lance icon indicating copy to clipboard operation
lance copied to clipboard

_rowaddr and _rowid not exposed for `merge_insert`

Open oceanusxiv opened this issue 1 year ago • 2 comments

Sort of a follow up on #3251, I noticed that _rowid and _rowaddr doesn't seem to be usable for merge_insert, while it works for merge. When I try to use it with a subcol update, something like

import pyarrow as pa
import polars as pl

initial_data = pa.table(
    {
        "a": range(10),
        "b": range(10),
        "c": range(10, 20),
    }
)

dataset = lance.write_dataset(
    initial_data, "/tmp/lance/test2.lance"
)

new_values = pl.from_arrow(dataset.to_table(with_row_id=True)).select(pl.col("_rowid"), pl.col("a") * 2)

(dataset.merge_insert("a").when_matched_update_all().execute(new_values))

gives me

OSError: Append with different schema: fields did not match, missing=[b, c], unexpected=[_rowid], location: /Users/runner/work/lance/lance/rust/lance-core/src/datatypes/schema.rs:142:27

oceanusxiv avatar Feb 09 '25 08:02 oceanusxiv

I think it's quite different from #3251 . because _rowid is managed by lance, we cannot insert _rowid into lance.

chenkovsky avatar Feb 26 '25 05:02 chenkovsky

If I’m not mistaken, this doesn’t have anything to do with merge_insert, does it? You just want to update a() specific column(s), right?

https://github.com/lance-format/lance/pull/4715 already covers this for Fragments. @wjones127, should we expose an API at the dataset level 🤔

aheev avatar Jan 02 '26 16:01 aheev