AttributeError: 'NoneType' object has no attribute 'get' on `dataset.filter`
Looks like filter is broken?
Repro:
import lance
ds = lance.dataset(path)
ds.filter("split = 'test'")
>>>
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "pyarrow/_dataset.pyx", line 796, in pyarrow._dataset.Dataset.filter
AttributeError: 'NoneType' object has no attribute 'get'
Looks like in that pyarrow file, self._scan_options is None
current_filter = self._scan_options.get("filter")
Version:
- pylance=='0.16.1'
- pyarrow=='17.0.0'
Is this a version issue? I tried downgrading to pyarrow=='12.0.0' but am still running into the error
Try split == 'test'
Same issue
I'm not sure if it does what you're expecting from .filter, but ds.to_table(filter="split == 'test'") should work if split is a column.
Originally was trying to solve this https://github.com/lancedb/lance/issues/2778 when I ran into this issue.
Building a custom torch dataloader for a 10TB dataset so materializing it isn't an option.
I get the same error. Putting a simple repro here:
import lance
import pyarrow as pa
table = pa.Table.from_pylist([{"name": "Alice", "age": 20},
{"name": "Bob", "age": 30}])
lance.write_dataset(table, "./alice_and_bob.lance", mode="overwrite")
ds = lance.dataset("./alice_and_bob.lance")
ds.filter("age == 30")
As far as I can tell, the above is intended use according to the api spec.
Hmm, yes. We extend pyarrow.dataset.Dataset because we want to appear as a pyarrow dataset since there is no dataset protocol at the moment. E.g. this is how DuckDb is able to query us (it thinks we are a pyarrow dataset).
In this case it looks like pyarrow has this function: https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Dataset.html#pyarrow.dataset.Dataset.filter
We are not overloading it and so it is falling back to the underlying pyarrow impl (which isn't meant to be used). We should overload this method and provide some kind of implementation.
The same probably goes for sort_by, join (we should nicely report it isn't supported), join_asof (same, not supported), and replace_schema (again, not supported).