lance icon indicating copy to clipboard operation
lance copied to clipboard

Limit parallelism in `Dataset.cleanup_old_versions`

Open tonyf opened this issue 1 year ago • 7 comments

Running into s3 rate limits when trying to cleanup a very large dataset with dataset.cleanup_old_versions. Can't seem to control this via LANCE_IO_THREADS

tonyf avatar Aug 28 '24 21:08 tonyf

This could be due to hard-coded concurrency in object_store: https://github.com/apache/arrow-rs/blob/a937869f892dc12c4730189e216bf3bd48c2561d/object_store/src/aws/mod.rs#L252

We might need to make this controllable upstream somehow.

wjones127 avatar Aug 28 '24 21:08 wjones127

Hm, is there any way to temporarily monkeypatch rust-level code in python?

tonyf avatar Aug 28 '24 22:08 tonyf

Actually, we don't use delete_stream (mainly by chance) so we probably don't need to worry about object_store. I suspect this is fixed in 0.17.0b9 (released yesterday) via https://github.com/lancedb/lance/pull/2773

We were previously using num_cpus::get and now are using LANCE_IO_THREADS.

westonpace avatar Aug 28 '24 22:08 westonpace

Actually, we don't use delete_stream (mainly by chance) so we probably don't need to worry about object_store.

What makes you say that? I see us call remove_stream here:

https://github.com/lancedb/lance/blob/2f25fc473dd69c8bc298c4f4e171b81f87660656/rust/lance/src/dataset/cleanup.rs#L289

Which dispatches to delete_stream here:

https://github.com/lancedb/lance/blob/2f25fc473dd69c8bc298c4f4e171b81f87660656/rust/lance-io/src/object_store.rs#L614-L615

wjones127 avatar Aug 28 '24 22:08 wjones127

I'm now getting

OSError: LanceError(IO): Generic S3 error: Got invalid DeleteObjects response: unknown variant `Code`, expected `Deleted` or `Error`, /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.30/src/fns.rs:368:13

Maybe this is happening because a previous cleanup operation failed without marking the version as deleted so it's getting a not found? Not sure how to work around this.

tonyf avatar Aug 29 '24 01:08 tonyf

What makes you say that? I see us call remove_stream here:

Ah, I was just searching for delete_stream and saw the parallelism on old_manifests and assumed that was it. My mistake.

westonpace avatar Aug 29 '24 01:08 westonpace

OSError: LanceError(IO): Generic S3 error: Got invalid DeleteObjects response: unknown variant Code, expected Deleted or Error, /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.30/src/fns.rs:368:13

That's a new one for me. Seems almost like a malformed S3 response.

westonpace avatar Aug 29 '24 17:08 westonpace