nihaoUCAS
nihaoUCAS
> Hmm interesting.. Is this the result of 0.0.1a4 version? > And How did you guys print out that result? 0.0.01a3 vesion the result is print out by `bert` cmd...
I guess, U = len() + len(labels), len() = 1. shouldn't in the labels, but in the encoder logits
@iamxiaoyubei Did you solve the problem? I meet the same problem.
I solve the problem by set: export TF_CUDNN_USE_AUTOTUNE=0
watch this.
原理上都支持,就是简单的把两个qora矩阵相乘,然后scale,最后加到原矩阵上去。
> > > 运行的命令是deepspeed --num_gpus=2 train.py --train_args_file train_args/sft.json时报错 > > > deepspeed的设置文件ds_z3_config.json 项目里未见使用? > > > > > > `train.py: error: unrecognized arguments: --local_rank=1` > > 我这里也会报这个错,请问你那里解决了吗? > > 恩,应该是torch版本的问题,我的版本太高了(2.0),不是按照文档里面装的(1.13)...
> I tried to replicate your experiment on lingvo, and I tried to keep everything you mentioned same. However, I can only get minimum WER of 25% upon all my...
> Hi, thanks for your kindly sharing your code!!! > > I have tried to use Librispeech dataset for training, and according to other issues, I run: `CUDA_VISIBLE_DEVICES=1,2,3,4,5 python -m...