ChineseNER Could we add new words?

E.g. if a word (北大) is not recognized as an organisation, could we add this word to let the model know this word?

May 17 '19 02:05 hgjt8989

E.g. if a word (北大) is not recognized as an organisation, could we add this word to let the model know this word?

of course, you can add 北/B_org 大/E_org into the train set.

May 21 '19 00:05 buppt

@buppt thanks 有个疑问 tensorflow那个是先python train.py 然后再python train.py pretrained 吗

May 23 '19 01:05 kedimomo

@buppt thanks 有个疑问 tensorflow那个是先python train.py 然后再python train.py pretrained 吗

不用，python train.py是不使用预训练词向量的训练，python train.py pretrained 是使用预训练的词向量训练。

May 23 '19 03:05 buppt

恩 O(∩_∩)O谢谢还有个疑问这个怎么增量数量的每次的新句子都要加在前面那个训练集吗然后重新跑一次train吗

May 23 '19 06:05 kedimomo

@buppt 这个train.py 会被执行吗 elif len(sys.argv)==3: 看了很久都没有看到过有输入3个参数的冗余代码吗谢谢

May 23 '19 12:05 kedimomo

恩 O(∩_∩)O谢谢还有个疑问这个怎么增量数量的每次的新句子都要加在前面那个训练集吗然后重新跑一次train吗

什么意思，是想自己加一些实体的例句？放训练集里或者在训练好的模型基础上继续训练都可以。三个参数不是文件名那个么，readme里有。

May 23 '19 12:05 buppt

@buppt 恩原来是我看漏了原来还有个文件批处理的谢谢，有比较详细的步骤，现在我已经跑完了train.py 如果要加新的语料训练在现在模型基础继续训练要执行那个命令呢谢谢

May 23 '19 15:05 kedimomo

谁能提供一下TensorFlow训练的模型

May 28 '19 04:05 badbabys

你训练不了吗用显卡大概3个小时

May 28 '19 06:05 kedimomo

说一下我遇到的问题哈， cd data/renMinRiBao/ python data_renmin_word.py 然后 cd tensorflow/ python train.py pretrained 然后报错如下： train len: 24271 test len: 7585 word2id len 3917 Creating the data generator ... Finished creating the data generator. use pretrained embedding begin to train... Traceback (most recent call last): File "train.py", line 107, in model = Model(config,embedding_pre,dropout_keep=0.5) File "/home/liyang22/github/ChineseNER/tensorflow/bilstm_crf.py", line 20, in init self._build_net() File "/home/liyang22/github/ChineseNER/tensorflow/bilstm_crf.py", line 56, in _build_net self.viterbi_sequence, viterbi_score = tf.contrib.crf.crf_decode(bilstm_out, self.transition_params,tf.tile(np.array([self.sen_len]),np.array([self.batch_size]))) File "/home/liyang22/tensorflow/local/lib/python2.7/site-packages/tensorflow/contrib/crf/python/ops/crf.py", line 537, in crf_decode false_fn=_multi_seq_fn) File "/home/liyang22/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/layers/utils.py", line 206, in smart_cond pred, true_fn=true_fn, false_fn=false_fn, name=name) File "/home/liyang22/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/smart_cond.py", line 56, in smart_cond return false_fn() File "/home/liyang22/tensorflow/local/lib/python2.7/site-packages/tensorflow/contrib/crf/python/ops/crf.py", line 501, in _multi_seq_fn sequence_length_less_one = math_ops.maximum(0, sequence_length - 1) File "/home/liyang22/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4602, in maximum "Maximum", x=x, y=y, name=name) File "/home/liyang22/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 546, in _apply_op_helper inferred_from[input_arg.type_attr])) TypeError: Input 'y' of 'Maximum' Op has type int64 that does not match type int32 of argument 'x'.

Jun 11 '19 03:06 bobkentt

@bobkentt 你看一下你的语料是不是有问题是你自己编写的吗

Jun 11 '19 06:06 kedimomo

@bobkentt 你看一下你的语料是不是有问题是你自己编写的吗

就是把项目直接clone下去啊，没用自己的语料，难到是我TensorFlow版本的问题？你是啥版本的啊？我这俩虚拟机安装的tf环境，版本分别是：1.10.0 1.12.0 都不行

Jun 11 '19 07:06 bobkentt

train.py 中改成int64也不行，同时也试了把数据label强转成int32

Jun 11 '19 09:06 bobkentt

你重新训练前有将前面训练好的模型文件删掉吗我用的是tensorflow-gpu==1.10.0

Jun 12 '19 07:06 kedimomo

@bobkentt

Jun 12 '19 07:06 kedimomo

@bobkentt 类型转为int32就可以了 self.viterbi_sequence, viterbi_score = tf.contrib.crf.crf_decode(tf.cast(bilstm_out, dtype=tf.int32), tf.cast(self.transition_params, dtype=tf.int32), tf.cast(sequence_length, dtype=tf.int32))

Jul 29 '19 08:07 bubblewu