Rishabh Mishra
Rishabh Mishra
Guys, can someone please answer this?. I am facing the same issue. Please share some code snippet to finetune the entire n/w
@knighton the same behaviour persists with even a single data loader when streaming from remote. @snarayan21 will keeping the workers alive completely solve for the util drop or will only...
@snarayan21 we did try setting the `persistent_workers=True` but this did not help. Attaching the graph fyr, there's always a 30 min drop after each epoch ![Uploading Screenshot 2024-04-03 at 10.25.47...
@snarayan21 I am passing the local batch size to StreamingDatasets. I have shared the scripts with databricks team, they will get in touch with you
And I think wait between epochs is also batch_size dependent. The higher the batch_size, higher is the wait time
@snarayan21 yeah there's lite preprocessing happening, basically a lookup (O(1)) and converting np arr to torch tensors. local_batch_size in my case is 512 and global is 4096, I train for...