Transfer as ZIP function as a toggle in cloud GUI
The code was made with Claude Code, but i did make sure it wasn't doing anything too schizo.
After looking further, my issue for slow individual file transfers is seemingly latency for the small files, i'll basically get at best 160ms to a datacenter using runpod/vast due to where I live, so maybe in that edge case zips would be better, of course the best would be huggingface though still, see my write-up below:
In this test I re-use the same runpod instance for transferring to avoid any variance issues with that, and I set the min download to 2000
Testing Settings:
File Sync Method: NATIVE_SCP 1 Concept using subdirs (8 directories)
I also tested the ZIP function using them as separate concepts, and it just zips each concept separately and uploads them, in the case of the non zipped transfer this distinction doesn't matter affect performance.
I time it from when I click start training(as its a instance that's already started and initialized/updated etc) and stop when it finishes uploading and starts caching.
Testing Specs:
Location: ZA Down: 1000mbps Up: 250mbps Disk: 3000MB/s
Runpod specs:
Location: CA Down: 2350 Mbps Up: 2027 Mbps Disk: 1300MB/s
Test Dataset:
Size: 1784MB Total Images: 5,111
Individual Transfer time:
Upload Speed: 3-4 Mbps average Upload Speed(worker size set to 10): 10mbps average 1 Concept: 1 Hour, 5 Minutes, 9 Seconds 1 Concept(worker size set to 10):
Zipped Transfer time:
Upload Speed: 70Mbps average 1 Concept: 4 Minutes, 57 Seconds 8 Concepts: 6 Minutes, 45 Seconds
For worker count i tried going to 16, but i had to settle for 10, as i got random errors i think its just cause of the amount of connections I'm doing.
It's about twice as fast with the higher worker count! But the packing solution seems better with the current implementation we're using for file transfer
pre-commit.ci autofix
Using tar files now the zip approach was unnecessarily complicated
Have not heard from author and code is not in a good spot. Closing for now. I will reinvestigate this at some point.