CLUENER2020
CLUENER2020 copied to clipboard
CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition
体验Demo 下的标注工具链接失效,可以更新一下吗?
您好,我想在pytorch_version版本基础上改用Robert-tiny或者Albert-tiny之类的小型bert模型进行实验,请问改动大吗?
sh run_ner_span.sh Didn't find file /pretrained_bert_models/bert-base-chinese/added_tokens.json. We won't load it. Didn't find file /pretrained_bert_models/bert-base-chinese/special_tokens_map.json. We won't load it. Didn't find file /pretrained_bert_models/bert-base-chinese/tokenizer_config.json. We won't load it.
https://github.com/CLUEbenchmark/CLUENER2020/blob/b6597268c000e06aa95bcdc59ef122805254cab6/pytorch_version/models/transformers/modeling_bert.py#L606 pytroch==1.6.0 需要注释extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype)才能运行 可能是和pytorch版本有关 否则会报错StopIteration `Traceback (most recent call last): File "/home/admin/zihe.zhu/20200902-CLUENER2020/pytorch_version/run_ner_crf.py", line 499, in main() File "/home/admin/zihe.zhu/20200902-CLUENER2020/pytorch_version/run_ner_crf.py", line 440, in main global_step, tr_loss = train(args, train_dataset, model,...
when i use run_ner_crf.py, i want switch to xlnet model to do ner, i come aross follow error: `File "pytorch_version\models\transformers\tokenization_utils.py", line 639, in split_on_tokens if sub_text not in self.added_tokens_encoder \...
当一个句子中出现多次同一词语被标注为同一实体类型时,官方只计数了一次。 比如 {"text": "两队上季曾在足总杯中相遇,纽卡客场0比0,主场4比1过关。不过纽卡本季的表现实在糟糕,", "label": {"organization": {"足总杯": [[6, 8]], "纽卡": [[13, 14], [31, 32]]}}} 官方计数为2,实际应为3
 这段代码来自CLUENER2020/bilstm_crf_pytorch/run_lstm_crf.py,这里的args.output_dir逻辑有些问题。
看了一下里面的标注,我发现有些样本是这样的  里面的“叶老桂”只标注了一次,而后一次则不算做标注。请问这是有意为之吗。我理解如果同一个实体出现多次,而只有第一次被标注,那么训练出的模型也只会关注首次出现的实体,从而对某些句子不能很好做出准确预测。是否应该对重复实体也进行重复标注呢?
``` for feat in feats: next_tag_var = ( forward_var.view(1, -1).expand(self.tagset_size, self.tagset_size) + self.transitions ) _, bptrs_t = torch.max(next_tag_var, dim=1) viterbivars_t = next_tag_var[range(len(bptrs_t)), bptrs_t] forward_var = viterbivars_t + feat backscores.append(forward_var) backpointers.append(bptrs_t)...
score.py文件中,不看位置不就成了多标签分类了吗? https://github.com/CLUEbenchmark/CLUENER2020/blob/master/score.py