effusiveperiscope
effusiveperiscope
I have experienced this before in a few situations: - actual model parameters are not being loaded from the checkpoint (there is some weird naming error involving "module" prefix between...
Has anyone here tried the fix by @stevenhillis?
> > Has anyone here tried the fix by @stevenhillis? > > where is the fix? broadcast_buffers=False in DistributedDataParallelKwargs. Seems to work OK if you remove the isnan() check (I...
Doesn't seem to work with slmadv training though.