我在用一个240万的中文数据集训练模型出现了一些问题，不知道如何解决

Open TWJ718925 opened this issue 6 years ago • 1 comments

你好！我是一名初学者，对于您的这篇论文很感兴趣，想着用中文的数据集来训练模型会是什么样的效果，但是我用了一个240万的中文数据集训练模型时遇到了一个问题，可能是自己能力不足没法解决，只能求助您，这个问题就是：在训练模型时，老是报错‘’tensorflow.python.framework.errors_impl.UnknownError: IndexError: too many indices for array‘’，我只跑通pos数据集，对于neg数据集也会出现同样的错误，我用的是服务器运行程序，而对于中文数据集，只训练‘’epoch:0 test_bleu:30.07800579071045 template_bleu:79.62971329689026 test_loss:6.98167085647583 test_ppl:1187.98974609375‘’然后就出现以下报错： root@a8f8e2b9891d:/notebooks# python self_attn.py --mask_rate 0.2 --blank_num 2 --filename_prefix 'data.' --data_dir './yelp_data/data/' /usr/local/lib/python3.5/dist-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters train_file:/notebooks/yelp_data/data/data.train.txt valid_file:/notebooks/yelp_data/data/data.valid.txt logdir:./log_dir/data.bsize150.epoch120.seqlen64.dynamic_lr.present0.8.partition2.hidden256.self_attn/ WARNING:tensorflow:From /notebooks/texar/utils/beam_search.py:87: calling reduce_logsumexp (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version. Instructions for updating: keep_dims is deprecated, use keepdims instead epoch:0 test_bleu:30.07800579071045 template_bleu:79.62971329689026 test_loss:6.98167085647583 test_ppl:1187.98974609375 Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1361, in _do_call return fn(*args) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1340, in _run_fn target_list, status, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.UnknownError: IndexError: too many indices for array [[Node: PyFunc_6 = PyFunc[Tin=[DT_INT64, DT_INT64], Tout=[DT_INT64, DT_INT64], token="pyfunc_6", _device="/job:localhost/replica:0/task:0/device:CPU:0"](PyFunc_5, Variable_7/read)]] [[Node: decoder_2/layer_5/ffn_1/ffn/conv1/Tensordot/Gather/_1765 = _HostRecvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_12352_decoder_2/layer_5/ffn_1/ffn/conv1/Tensordot/Gather", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Sep 03 '19 07:09 TWJ718925

这是报错的全过程： root@a8f8e2b9891d:/notebooks# python self_attn.py --mask_rate 0.2 --blank_num 2 --filename_prefix 'data.' --data_dir './yelp_data/data/' /usr/local/lib/python3.5/dist-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters train_file:/notebooks/yelp_data/data/data.train.txt valid_file:/notebooks/yelp_data/data/data.valid.txt logdir:./log_dir/data.bsize150.epoch120.seqlen64.dynamic_lr.present0.8.partition2.hidden256.self_attn/ WARNING:tensorflow:From /notebooks/texar/utils/beam_search.py:87: calling reduce_logsumexp (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version. Instructions for updating: keep_dims is deprecated, use keepdims instead epoch:0 test_bleu:30.07800579071045 template_bleu:79.62971329689026 test_loss:6.98167085647583 test_ppl:1187.98974609375 Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1361, in _do_call return fn(*args) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1340, in _run_fn target_list, status, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.UnknownError: IndexError: too many indices for array [[Node: PyFunc_6 = PyFunc[Tin=[DT_INT64, DT_INT64], Tout=[DT_INT64, DT_INT64], token="pyfunc_6", _device="/job:localhost/replica:0/task:0/device:CPU:0"](PyFunc_5, Variable_7/read)]] [[Node: decoder_2/layer_5/ffn_1/ffn/conv1/Tensordot/Gather/_1765 = _HostRecvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_12352_decoder_2/layer_5/ffn_1/ffn/conv1/Tensordot/Gather", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "self_attn.py", line 335, in tf.app.run(main=_main) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 126, in run _sys.exit(main(argv)) File "self_attn.py", line 319, in _main train_bleu_scores, _ = _test_epoch(sess, epoch, mode='train') File "self_attn.py", line 201, in _test_epoch rtns = cur_sess.run(fetches, feed_dict=feed) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 905, in run run_metadata_ptr) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1137, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1355, in _do_run options, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1374, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.UnknownError: IndexError: too many indices for array [[Node: PyFunc_6 = PyFunc[Tin=[DT_INT64, DT_INT64], Tout=[DT_INT64, DT_INT64], token="pyfunc_6", _device="/job:localhost/replica:0/task:0/device:CPU:0"](PyFunc_5, Variable_7/read)]] [[Node: decoder_2/layer_5/ffn_1/ffn/conv1/Tensordot/Gather/_1765 = _HostRecvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_12352_decoder_2/layer_5/ffn_1/ffn/conv1/Tensordot/Gather", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Caused by op 'PyFunc_6', defined at: File "self_attn.py", line 335, in tf.app.run(main=_main) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 126, in run _sys.exit(main(argv)) File "self_attn.py", line 85, in _main mask_id, eoa_id, pad_id) File "/notebooks/texar/utils/transformer_utils.py", line 730, in update_template_pack start_positions, end_positions = _get_start_end_pos(masked_inputs, mask_id) File "/notebooks/texar/utils/transformer_utils.py", line 559, in _get_start_end_pos [tf.int64, tf.int64]) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/script_ops.py", line 317, in py_func func=func, inp=inp, Tout=Tout, stateful=stateful, eager=False, name=name) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/script_ops.py", line 225, in _internal_py_func input=inp, token=token, Tout=Tout, name=name) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 93, in _py_func "PyFunc", input=input, token=token, Tout=Tout, name=name) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3271, in create_op op_def=op_def) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1650, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

UnknownError (see above for traceback): IndexError: too many indices for array [[Node: PyFunc_6 = PyFunc[Tin=[DT_INT64, DT_INT64], Tout=[DT_INT64, DT_INT64], token="pyfunc_6", _device="/job:localhost/replica:0/task:0/device:CPU:0"](PyFunc_5, Variable_7/read)]] [[Node: decoder_2/layer_5/ffn_1/ffn/conv1/Tensordot/Gather/_1765 = _HostRecvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_12352_decoder_2/layer_5/ffn_1/ffn/conv1/Tensordot/Gather", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Sep 03 '19 07:09 TWJ718925