nihaoUCAS comments

Results 10 comments of


                                            nihaoUCAS

pred_loss decrease fast while avg_acc stay at 50%

me too

pred_loss decrease fast while avg_acc stay at 50%

> Hmm interesting.. Is this the result of 0.0.1a4 version? > And How did you guys print out that result? 0.0.01a3 vesion the result is print out by `bert` cmd...

question about the rnnt loss arguments

I guess, U = len() + len(labels), len() = 1. shouldn't in the labels, but in the encoder logits

GPU utilization down to 0% without any error infos

@iamxiaoyubei Did you solve the problem? I meet the same problem.

GPU utilization down to 0% without any error infos

I solve the problem by set: export TF_CUDNN_USE_AUTOTUNE=0

cannot find code for minimum word error rate

watch this.

[Question] baichuan-7b是否可以支持lora model 合并？

原理上都支持，就是简单的把两个qora矩阵相乘，然后scale，最后加到原矩阵上去。

使用deepspeed训练时报错train.py: error: unrecognized arguments: --local_rank=1

> > > 运行的命令是deepspeed --num_gpus=2 train.py --train_args_file train_args/sft.json时报错 > > > deepspeed的设置文件ds_z3_config.json 项目里未见使用？ > > > > > > `train.py: error: unrecognized arguments: --local_rank=1` > > 我这里也会报这个错，请问你那里解决了吗？ > > 恩，应该是torch版本的问题，我的版本太高了（2.0）,不是按照文档里面装的（1.13）...

Failed to replication the experiment on lingvo

> I tried to replicate your experiment on lingvo, and I tried to keep everything you mentioned same. However, I can only get minimum WER of 25% upon all my...

Any advice on solving multi-GPU training failure?

> Hi, thanks for your kindly sharing your code!!! > > I have tried to use Librispeech dataset for training, and according to other issues, I run: `CUDA_VISIBLE_DEVICES=1,2,3,4,5 python -m...