jiyunjie comments

Results 9 comments of


                                            jiyunjie

Two approaches to improve the performance

I further read your evaluation scripts, and found that your evaluation metrics are too strict. In your code, if an argument is correctly classified, its corresponding predicted trigger should also...

Why doing Key and Query Masking ?

Query masking is unnecessary? cause the padded query will be masked out by next block's key masking?

训练稍微长点的句子就特别慢

应该是代码效率的问题。在同样的硬件条件和参数配置下，我用别的开源代码，在相同结构的Bi-LSTM+CRF模型下，训练速度很快。 @XINGXIAOYU

有对比过llama-7B和Bloom-7B在中文上的finetune后的效果吗

> 同问，是否考虑发布基于LLaMA-7B的模型已经发布LLAMA-7B，基于200万数据训练。

[QUESTION] Training Mixtral 8x7B on 16 x H100 only achieves low throughput of 130 TFLOPS

> > > Thank you for reporting this issue. 130 TFLOPS is indeed too low for the H100. I quickly reviewed your script and have some suggestions: > > >...

[QUESTION] Training Mixtral 8x7B on 16 x H100 only achieves low throughput of 130 TFLOPS

> Hi, thanks for the suggestions. > I retested the throuput according to your suggestion. > To be more specific: > > 1. Update Megatron-LM the latest commit (https://github.com/NVIDIA/Megatron-LM/commit/ba773259dbe5735fbd91ca41e7f4ded60b335c52) >...

F1 score dropping to zero

> Hi, > During my test I got the same problem. Even after many epoch most time precision, recall, and f1 are 0. > ![image](https://user-images.githubusercontent.com/23587964/56629380-bed4af00-667f-11e9-83bb-66e99e35a1f2.png) > Did you find the...

feature-request: publish half-precision models

Actually you can train or infer with fp16 by simple settings, [this doc]( https://huggingface.co/docs/transformers/v4.13.0/en/performance#fp16) may help you. We are going to provide some mixed-precision training and inference demo code as...

NaN in multi-node training

same issue