zhhao1
zhhao1
你好,这个问题解决了吗
我找到解决办法了,我用单个GPU把这个打印出来next(self.parameters()).dtype, 都是torch.float32,应该就是版本问题。直接替换掉就可以了
My experience: model.half() adam(eps=1e-8) loss:nan model.half() sgd loss:normal, however, non convergence model.half() adam(eps=1-4) loss:normal, however, non convergence model.half() fp16 loss:normal, however, non convergence model adam(eps=1e-8) loss:normal, convergence Remove .half() can...
> > My experience: model.half() adam(eps=1e-8) loss:nan model.half() sgd loss:normal, however, non convergence model.half() adam(eps=1-4) loss:normal, however, non convergence model.half() fp16 loss:normal, however, non convergence model adam(eps=1e-8) loss:normal, convergence Remove...
> The first release of Distil-Whisper will be for English. We'll be releasing training code next week to facilitate anyone in the community to distill Whisper on their choice of...