Sujee Maniyam

Results 27 comments of Sujee Maniyam

Yes, the error is quite obvious 🤣 my suspicion is its caused by a race condition between workers trying to cleanup downloaded artifacts. Adding: I see this consistently on Google...

hash is based on file content - treating it as bunch of bytes. So yes, we do need to read the files. but no need to process them (no pdf2pq...

> @sujee how would you treat zip and tar files? For first-version, I plan to treat zip/tar files as ONE file. So if there are duplicate zip/tar files, dupes will...

https://github.com/sujee/data-prep-kit/commit/08024dc3b049ca69bf4ffa84352754867dbd3f79 makes required changes. Related : #585

I have made the necessary changes on my branch. Will submit a PR soon

> @sujee do you think the error is specific to ededup or does it occur for all ray transforms? thanks I think this is more RAY related. Probably need to...

here is a similar example : https://pypi.org/project/tf-nightly/ ![image](https://github.com/user-attachments/assets/fd5570bc-0726-47a5-8a5a-093b489c1dfc)