Jayasimha T

Results 6 comments of Jayasimha T

Debugging so far. @leoozy , @peteriz The program is getting stuck at this step, https://github.com/IntelLabs/academic-budget-bert/blob/04f6da685acf4dfc47b85b42307e17340e87fde3/run_pretraining.py#L219 At the surface, it looks like an issue with deepspeed or the way in which...

This issue is probably related to https://github.com/Lightning-AI/lightning/issues/13498

@leoozy @peteriz Update: I raised this issue in the deepspeed repo, turns out Deepspeed doesnot support variable number of batches in each process. To quote their response "**The common practice...

Sure @peteriz . I will try this over the weekend.

Hi, I evaluated two approaches for re-factoring dataset iterator 1. Use shared memory to share imbalanced data between process. However, shared memory has to be allocated before creating worker processes....

Hi @peteriz , I was going through the code once again. I think global_rank=0, should **not** be deleted (https://github.com/IntelLabs/academic-budget-bert/issues/22#issuecomment-1173159490). Even though each process reads the same file, since we are...