CLUENER2020 icon indicating copy to clipboard operation
CLUENER2020 copied to clipboard

CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition

Results 59 CLUENER2020 issues
Sort by recently updated
recently updated
newest added

体验Demo 下的标注工具链接失效,可以更新一下吗?

您好,我想在pytorch_version版本基础上改用Robert-tiny或者Albert-tiny之类的小型bert模型进行实验,请问改动大吗?

sh run_ner_span.sh Didn't find file /pretrained_bert_models/bert-base-chinese/added_tokens.json. We won't load it. Didn't find file /pretrained_bert_models/bert-base-chinese/special_tokens_map.json. We won't load it. Didn't find file /pretrained_bert_models/bert-base-chinese/tokenizer_config.json. We won't load it.

https://github.com/CLUEbenchmark/CLUENER2020/blob/b6597268c000e06aa95bcdc59ef122805254cab6/pytorch_version/models/transformers/modeling_bert.py#L606 pytroch==1.6.0 需要注释extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype)才能运行 可能是和pytorch版本有关 否则会报错StopIteration `Traceback (most recent call last): File "/home/admin/zihe.zhu/20200902-CLUENER2020/pytorch_version/run_ner_crf.py", line 499, in main() File "/home/admin/zihe.zhu/20200902-CLUENER2020/pytorch_version/run_ner_crf.py", line 440, in main global_step, tr_loss = train(args, train_dataset, model,...

when i use run_ner_crf.py, i want switch to xlnet model to do ner, i come aross follow error: `File "pytorch_version\models\transformers\tokenization_utils.py", line 639, in split_on_tokens if sub_text not in self.added_tokens_encoder \...

当一个句子中出现多次同一词语被标注为同一实体类型时,官方只计数了一次。 比如 {"text": "两队上季曾在足总杯中相遇,纽卡客场0比0,主场4比1过关。不过纽卡本季的表现实在糟糕,", "label": {"organization": {"足总杯": [[6, 8]], "纽卡": [[13, 14], [31, 32]]}}} 官方计数为2,实际应为3

![image](https://user-images.githubusercontent.com/69983328/148165235-f827c200-db0b-4ec4-8b18-2d125d5e5301.png) 这段代码来自CLUENER2020/bilstm_crf_pytorch/run_lstm_crf.py,这里的args.output_dir逻辑有些问题。

看了一下里面的标注,我发现有些样本是这样的 ![image](https://user-images.githubusercontent.com/55855936/147634079-4cd8096f-02b6-4278-b167-1f7f3948e773.png) 里面的“叶老桂”只标注了一次,而后一次则不算做标注。请问这是有意为之吗。我理解如果同一个实体出现多次,而只有第一次被标注,那么训练出的模型也只会关注首次出现的实体,从而对某些句子不能很好做出准确预测。是否应该对重复实体也进行重复标注呢?

``` for feat in feats: next_tag_var = ( forward_var.view(1, -1).expand(self.tagset_size, self.tagset_size) + self.transitions ) _, bptrs_t = torch.max(next_tag_var, dim=1) viterbivars_t = next_tag_var[range(len(bptrs_t)), bptrs_t] forward_var = viterbivars_t + feat backscores.append(forward_var) backpointers.append(bptrs_t)...

score.py文件中,不看位置不就成了多标签分类了吗? https://github.com/CLUEbenchmark/CLUENER2020/blob/master/score.py