effusiveperiscope

Results 4 comments of effusiveperiscope

I have experienced this before in a few situations: - actual model parameters are not being loaded from the checkpoint (there is some weird naming error involving "module" prefix between...

Has anyone here tried the fix by @stevenhillis?

> > Has anyone here tried the fix by @stevenhillis? > > where is the fix? broadcast_buffers=False in DistributedDataParallelKwargs. Seems to work OK if you remove the isnan() check (I...

Doesn't seem to work with slmadv training though.