CLUENER2020 issues

体验Demo 下的标注工具链接失效

1

体验Demo 下的标注工具链接失效，可以更新一下吗？

nlper01

Robert-tiny、Albert-tiny之类的小型bert实验

您好，我想在pytorch_version版本基础上改用Robert-tiny或者Albert-tiny之类的小型bert模型进行实验，请问改动大吗？

zy614582280

Didn't find file /pretrained_bert_models/bert-base-chinese/added_tokens.json. We won't load it. Didn't find file /pretrained_bert_models/bert-base-chinese/special_tokens_map.json. We won't load it. Didn't find file /pretrained_bert_models/bert-base-chinese/tokenizer_config.json. We won't load it.

1

sh run_ner_span.sh Didn't find file /pretrained_bert_models/bert-base-chinese/added_tokens.json. We won't load it. Didn't find file /pretrained_bert_models/bert-base-chinese/special_tokens_map.json. We won't load it. Didn't find file /pretrained_bert_models/bert-base-chinese/tokenizer_config.json. We won't load it.

550952213

pytroch==1.6.0 需要注释extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype)才能运行

2

https://github.com/CLUEbenchmark/CLUENER2020/blob/b6597268c000e06aa95bcdc59ef122805254cab6/pytorch_version/models/transformers/modeling_bert.py#L606 pytroch==1.6.0 需要注释extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype)才能运行可能是和pytorch版本有关否则会报错StopIteration `Traceback (most recent call last): File "/home/admin/zihe.zhu/20200902-CLUENER2020/pytorch_version/run_ner_crf.py", line 499, in main() File "/home/admin/zihe.zhu/20200902-CLUENER2020/pytorch_version/run_ner_crf.py", line 440, in main global_step, tr_loss = train(args, train_dataset, model,...

IamHehe

TypeError: unhashable type: 'list'

2

when i use run_ner_crf.py, i want switch to xlnet model to do ner, i come aross follow error: `File "pytorch_version\models\transformers\tokenization_utils.py", line 639, in split_on_tokens if sub_text not in self.added_tokens_encoder \...

SmartMapple

官方标签分布统计错误

2

当一个句子中出现多次同一词语被标注为同一实体类型时，官方只计数了一次。比如 {"text": "两队上季曾在足总杯中相遇，纽卡客场0比0，主场4比1过关。不过纽卡本季的表现实在糟糕，", "label": {"organization": {"足总杯": [[6, 8]], "纽卡": [[13, 14], [31, 32]]}}} 官方计数为2，实际应为3

heyoma

run_lstm_crf.py中args.output_dir的问题

1

![image](https://user-images.githubusercontent.com/69983328/148165235-f827c200-db0b-4ec4-8b18-2d125d5e5301.png) 这段代码来自CLUENER2020/bilstm_crf_pytorch/run_lstm_crf.py，这里的args.output_dir逻辑有些问题。

TraineeMagician

对于数据集标注的一些问题

1

看了一下里面的标注，我发现有些样本是这样的 ![image](https://user-images.githubusercontent.com/55855936/147634079-4cd8096f-02b6-4278-b167-1f7f3948e773.png) 里面的“叶老桂”只标注了一次，而后一次则不算做标注。请问这是有意为之吗。我理解如果同一个实体出现多次，而只有第一次被标注，那么训练出的模型也只会关注首次出现的实体，从而对某些句子不能很好做出准确预测。是否应该对重复实体也进行重复标注呢？

PeihanDou

Viterbi decode是否存在问题？

``` for feat in feats: next_tag_var = ( forward_var.view(1, -1).expand(self.tagset_size, self.tagset_size) + self.transitions ) _, bptrs_t = torch.max(next_tag_var, dim=1) viterbivars_t = next_tag_var[range(len(bptrs_t)), bptrs_t] forward_var = viterbivars_t + feat backscores.append(forward_var) backpointers.append(bptrs_t)...

tymanman

评测时为什么只看类别，不看位置？？

2

score.py文件中，不看位置不就成了多标签分类了吗？ https://github.com/CLUEbenchmark/CLUENER2020/blob/master/score.py

XuJianzhi

CLUENER2020
CLUENER2020 copied to clipboard

Metadata

体验Demo 下的标注工具链接失效

Robert-tiny、Albert-tiny之类的小型bert实验

Didn't find file /pretrained_bert_models/bert-base-chinese/added_tokens.json. We won't load it. Didn't find file /pretrained_bert_models/bert-base-chinese/special_tokens_map.json. We won't load it. Didn't find file /pretrained_bert_models/bert-base-chinese/tokenizer_config.json. We won't load it.

pytroch==1.6.0 需要注释extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype)才能运行

TypeError: unhashable type: 'list'

官方标签分布统计错误

run_lstm_crf.py中args.output_dir的问题

对于数据集标注的一些问题

Viterbi decode是否存在问题？

评测时为什么只看类别，不看位置？？

← Metadata

Owner

Metadata

CLUENER2020 CLUENER2020 copied to clipboard

Metadata

← Metadata

Owner

Metadata

CLUENER2020
CLUENER2020 copied to clipboard