lance
lance copied to clipboard
Dataset::take() crashes on the lance data on S3
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/tmp/ipykernel_265067/3254674135.py", line 18, in read
File "/root/miniconda3/lib/python3.8/site-packages/lance/dataset.py", line 213, in take
return pa.Table.from_batches([self._ds.take(indices)])
OSError: LanceError(I/O): Generic S3 error: Error performing get request path/2023-02-25/9940C/data/6746943f-bdec-48d9-8701-522277132e56.lance: response error "<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidRange</Code><Message>The requested range is not satisfiable</Message><Key>path/stageB_lance_test/2023-02-25/9940C/data/6746943f-bdec-48d9-8701-522277132e56.lance</Key><BucketName>bucket</BucketName><Resource>/bucket/path/stageB_lance_test/2023-02-25/9940C/data/6746943f-bdec-48d9-8701-522277132e56.lance</Resource><RequestId>1750EA772DF82272</RequestId><HostId>9bb5501a-8602-4db7-9d0a-b44760bcc6e1</HostId></Error>", after 0 retries: HTTP status client error (416 Range Not Satisfiable) for url (http://s3-labeling.foo.com/bucket/path/stageB_lance_test/2023-02-25/9940C/data/6746943f-bdec-48d9-8701-522277132e56.lance)
Do you have a concrete code/resource path to reproduce?
http://s3-labeling.foo.com/bucket/path/stageB_lance_test/2023-02-25/9940C/data/6746943f-bdec-48d9-8701-522277132e56.lance
is available but I guess that part of a bigger path?
I have this same issue. I guess maybe if you can pass through a S3File object, so you can pass through credentials.