minigo icon indicating copy to clipboard operation
minigo copied to clipboard

tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Assign requires shapes of both tensors to match

Open herogan2017 opened this issue 5 years ago • 3 comments

when I use python gtp.py for 000990-pallas, it outputs following error. How to solve it?thanks.

I0521 23:15:21.790073 140320694859520 saver.py:1284] Restoring parameters from /home/gzd/Others/Backup/minigo/model/000990-pallas/v17-19x19_models_000990-pallas Traceback (most recent call last): File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [361,256] rhs shape= [128,512] [[{{node save/Assign_327}}]] [[save/RestoreV2/_536]] (1) Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [361,256] rhs shape= [128,512] [[{{node save/Assign_327}}]] 0 successful operations. 0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1290, in restore {self.saver_def.filename_tensor_name: save_path}) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run run_metadata_ptr) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run feed_dict_tensor, options, run_metadata) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run run_metadata) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [361,256] rhs shape= [128,512] [[node save/Assign_327 (defined at /home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] [[save/RestoreV2/_536]] (1) Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [361,256] rhs shape= [128,512] [[node save/Assign_327 (defined at /home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] 0 successful operations. 0 derived errors ignored.

Original stack trace for 'save/Assign_327': File "gtp.py", line 93, in app.run(main) File "/home/gzd/anaconda3/lib/python3.7/site-packages/absl/app.py", line 299, in run _run_main(main, args) File "/home/gzd/anaconda3/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main sys.exit(main(argv)) File "gtp.py", line 85, in main minigui_mode=FLAGS.minigui_mode) File "gtp.py", line 52, in make_gtp_instance n = DualNetwork(load_file) File "/home/gzd/Others/Backup/minigo/dual_net.py", line 187, in init self.initialize_graph() File "/home/gzd/Others/Backup/minigo/dual_net.py", line 202, in initialize_graph self.initialize_weights(self.save_file) File "/home/gzd/Others/Backup/minigo/dual_net.py", line 212, in initialize_weights tf.train.Saver().restore(self.sess, save_file) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 828, in init self.build() File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 840, in build self._build(self._filename, build_save=True, build_restore=True) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 878, in _build build_restore=build_restore) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 508, in _build_internal restore_sequentially, reshape) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 350, in _AddRestoreOps assign_ops.append(saveable.restore(saveable_tensors, shapes)) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saving/saveable_object_util.py", line 73, in restore self.op.get_shape().is_fully_defined()) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/state_ops.py", line 227, in assign validate_shape=validate_shape) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_state_ops.py", line 66, in assign use_locking=use_locking, name=name) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper op_def=op_def) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op attrs, op_def, compute_device) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal op_def=op_def) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in init self._traceback = tf_stack.extract_stack()

herogan2017 avatar May 21 '20 15:05 herogan2017

v17 was 40-block, so try running with --trunk_layers=39, or change the default value of the flag in dual_net.py.

(it might be 40 instead of 39, i forget)

amj avatar May 21 '20 16:05 amj

Hello, thank you for your reply. I try to change the value in dual_net.py as following:

flags.DEFINE_integer('trunk_layers', go.N,

'The number of resnet layers in the shared trunk.')

flags.DEFINE_integer('trunk_layers', 39, 'The number of resnet layers in the shared trunk.')

But it still doesn't work.

I0522 00:44:33.190119 140452096435968 saver.py:1284] Restoring parameters from /home/gzd/Others/minigo/minigo-models/models/000990-pallas/v17-19x19_models_000990-pallas 2020-05-22 00:44:34.526539: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key batch_normalization_39/beta not found in checkpoint Traceback (most recent call last): File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found. (0) Not found: Key batch_normalization_39/beta not found in checkpoint [[{{node save/RestoreV2}}]] (1) Not found: Key batch_normalization_39/beta not found in checkpoint [[{{node save/RestoreV2}}]] [[save/RestoreV2/_49]] 0 successful operations. 0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1290, in restore {self.saver_def.filename_tensor_name: save_path}) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run run_metadata_ptr) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run feed_dict_tensor, options, run_metadata) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run run_metadata) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found. (0) Not found: Key batch_normalization_39/beta not found in checkpoint [[node save/RestoreV2 (defined at /home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] (1) Not found: Key batch_normalization_39/beta not found in checkpoint [[node save/RestoreV2 (defined at /home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] [[save/RestoreV2/_49]] 0 successful operations. 0 derived errors ignored.

Original stack trace for 'save/RestoreV2': File "gtp.py", line 93, in app.run(main) File "/home/gzd/anaconda3/lib/python3.7/site-packages/absl/app.py", line 299, in run _run_main(main, args) File "/home/gzd/anaconda3/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main sys.exit(main(argv)) File "gtp.py", line 85, in main minigui_mode=FLAGS.minigui_mode) File "gtp.py", line 52, in make_gtp_instance n = DualNetwork(load_file) File "/home/gzd/Others/minigo/dual_net.py", line 190, in init self.initialize_graph() File "/home/gzd/Others/minigo/dual_net.py", line 205, in initialize_graph self.initialize_weights(self.save_file) File "/home/gzd/Others/minigo/dual_net.py", line 215, in initialize_weights tf.train.Saver().restore(self.sess, save_file) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 828, in init self.build() File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 840, in build self._build(self._filename, build_save=True, build_restore=True) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 878, in _build build_restore=build_restore) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 508, in _build_internal restore_sequentially, reshape) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 328, in _AddRestoreOps restore_sequentially) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 575, in bulk_restore return io_ops.restore_v2(filename_tensor, names, slices, dtypes) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1696, in restore_v2 name=name) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper op_def=op_def) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op attrs, op_def, compute_device) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal op_def=op_def) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in init self._traceback = tf_stack.extract_stack()

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1300, in restore names_to_keys = object_graph_key_mapping(save_path) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1618, in object_graph_key_mapping object_graph_string = reader.get_tensor(trackable.OBJECT_GRAPH_PROTO_KEY) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line 915, in get_tensor return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str)) tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "gtp.py", line 93, in app.run(main) File "/home/gzd/anaconda3/lib/python3.7/site-packages/absl/app.py", line 299, in run _run_main(main, args) File "/home/gzd/anaconda3/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main sys.exit(main(argv)) File "gtp.py", line 85, in main minigui_mode=FLAGS.minigui_mode) File "gtp.py", line 52, in make_gtp_instance n = DualNetwork(load_file) File "/home/gzd/Others/minigo/dual_net.py", line 190, in init self.initialize_graph() File "/home/gzd/Others/minigo/dual_net.py", line 205, in initialize_graph self.initialize_weights(self.save_file) File "/home/gzd/Others/minigo/dual_net.py", line 215, in initialize_weights tf.train.Saver().restore(self.sess, save_file) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1306, in restore err, "a Variable name or other graph key that is missing") tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

2 root error(s) found. (0) Not found: Key batch_normalization_39/beta not found in checkpoint [[node save/RestoreV2 (defined at /home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] (1) Not found: Key batch_normalization_39/beta not found in checkpoint [[node save/RestoreV2 (defined at /home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] [[save/RestoreV2/_49]] 0 successful operations. 0 derived errors ignored.

Original stack trace for 'save/RestoreV2': File "gtp.py", line 93, in app.run(main) File "/home/gzd/anaconda3/lib/python3.7/site-packages/absl/app.py", line 299, in run _run_main(main, args) File "/home/gzd/anaconda3/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main sys.exit(main(argv)) File "gtp.py", line 85, in main minigui_mode=FLAGS.minigui_mode) File "gtp.py", line 52, in make_gtp_instance n = DualNetwork(load_file) File "/home/gzd/Others/minigo/dual_net.py", line 190, in init self.initialize_graph() File "/home/gzd/Others/minigo/dual_net.py", line 205, in initialize_graph self.initialize_weights(self.save_file) File "/home/gzd/Others/minigo/dual_net.py", line 215, in initialize_weights tf.train.Saver().restore(self.sess, save_file) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 828, in init self.build() File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 840, in build self._build(self._filename, build_save=True, build_restore=True) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 878, in _build build_restore=build_restore) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 508, in _build_internal restore_sequentially, reshape) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 328, in _AddRestoreOps restore_sequentially) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 575, in bulk_restore return io_ops.restore_v2(filename_tensor, names, slices, dtypes) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1696, in restore_v2 name=name) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper op_def=op_def) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op attrs, op_def, compute_device) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal op_def=op_def) File "/home/gzd/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in init self._traceback = tf_stack.extract_stack()

herogan2017 avatar May 21 '20 16:05 herogan2017

@herogan2017 same problem. Did you fix this issue?

huynq55 avatar Dec 24 '20 13:12 huynq55