lance icon indicating copy to clipboard operation
lance copied to clipboard

Dataset::take() crashes on the lance data on S3

Open eddyxu opened this issue 2 years ago • 2 comments

org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/tmp/ipykernel_265067/3254674135.py", line 18, in read
  File "/root/miniconda3/lib/python3.8/site-packages/lance/dataset.py", line 213, in take
    return pa.Table.from_batches([self._ds.take(indices)])
OSError: LanceError(I/O): Generic S3 error: Error performing get request path/2023-02-25/9940C/data/6746943f-bdec-48d9-8701-522277132e56.lance: response error "<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidRange</Code><Message>The requested range is not satisfiable</Message><Key>path/stageB_lance_test/2023-02-25/9940C/data/6746943f-bdec-48d9-8701-522277132e56.lance</Key><BucketName>bucket</BucketName><Resource>/bucket/path/stageB_lance_test/2023-02-25/9940C/data/6746943f-bdec-48d9-8701-522277132e56.lance</Resource><RequestId>1750EA772DF82272</RequestId><HostId>9bb5501a-8602-4db7-9d0a-b44760bcc6e1</HostId></Error>", after 0 retries: HTTP status client error (416 Range Not Satisfiable) for url (http://s3-labeling.foo.com/bucket/path/stageB_lance_test/2023-02-25/9940C/data/6746943f-bdec-48d9-8701-522277132e56.lance)

eddyxu avatar Mar 29 '23 17:03 eddyxu

Do you have a concrete code/resource path to reproduce? http://s3-labeling.foo.com/bucket/path/stageB_lance_test/2023-02-25/9940C/data/6746943f-bdec-48d9-8701-522277132e56.lance is available but I guess that part of a bigger path?

Renkai avatar Apr 04 '23 01:04 Renkai

I have this same issue. I guess maybe if you can pass through a S3File object, so you can pass through credentials.

JSpenced avatar Apr 10 '23 13:04 JSpenced