zizhan issues

Results 8 issues of


                                            zizhan

multi-gpus

Could the distillation networks be trained on multiple gpus?

> 这里可以在计算loss的时候将padding部分mask掉。 > > 不过当时写的时候因为padding部分idx 为 0，所以在计算loss的时候影响不太大，就没考虑mask. @xuanzebi 您好，为什么padding部分的label id=0，在计算loss的时候影响不大？这时one-hot标签向量第0维是1吧 _Originally posted by @zdgithub in https://github.com/xuanzebi/BERT-CH-NER/issues/6#issuecomment-1054929224_

how to download the tweets

Sorry, I don't know how to download the tweets corresponding to the IDs published. Could you provide some instructions or the tweet texts instead?

Why not filter out the [PAD] tokens when computing loss?

I found that the losses of the [PAD] tokens were also computed when training the model. But in fact, we should filter out them as https://github.com/kyzhouhzau/BERT-NER/blob/master/BERT_NER.py does. Why didn't you...

近音字替换

感觉没有实现`近音字替换`？实现的是`同音字替换`？

中文数据集下载来源

你好，你在readme中提供了OntoNotes 4.0中文NER数据集的下载来源，但是其余三个中文数据集MSRA、Resume、和Weibo数据集的下载来源链接却没有提供，能麻烦也提供一下吗？因为我也从网上下载了这几份中文NER数据集，但是发现数量和你论文附录里介绍的不一样，因此无法公平比较，希望你能够提供一下这3份中文NER数据集的来源，感谢！

unlabeled data

你好！论文中提到搜索label words时借助了unlabeled data和lexicon-based annotation，但是我发现代码目录`dataset/conll/distant_data`里的数据好像就是conll03数据集的全量数据，并不是远程监督得到的数据，请问这点能解释下吗？

扮演不同角色的原理是什么

是针对每种角色，都有一定量的训练数据吗？