[ALBERT] Tokenization crashes while trying to finetune classifier with TF Hub model
I'm trying to get ALBERT running locally with the following command line:
python -m albert.run_classifier_with_tfhub --task_name=MNLI --data_dir=./multinli_1.0 --albert_hub_module_handle=https://tfhub.dev/google/albert_large/1 --output_dir=./output --do_train=True
When tokenizer is initialized from TF Hub model it crashes:
Traceback (most recent call last):
File "/Users/vladimirbugay/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 320, in <module>
tf.app.run()
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 187, in main
tokenizer = create_tokenizer_from_hub_module(FLAGS.albert_hub_module_handle)
File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 161, in create_tokenizer_from_hub_module
spm_model_file=FLAGS.spm_model_file)
File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/tokenization.py", line 247, in __init__
self.vocab = load_vocab(vocab_file)
File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/tokenization.py", line 201, in load_vocab
token = token.strip().split()[0]
IndexError: list index out of range
The issue is with the line being just a newline character '\n'. However, even if I modify code to ignore them it still crashes later with
Traceback (most recent call last):
File "/Users/vladimirbugay/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 320, in <module>
tf.app.run()
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 187, in main
tokenizer = create_tokenizer_from_hub_module(FLAGS.albert_hub_module_handle)
File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 161, in create_tokenizer_from_hub_module
spm_model_file=FLAGS.spm_model_file)
File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/tokenization.py", line 249, in __init__
self.vocab = load_vocab(vocab_file)
File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/tokenization.py", line 198, in load_vocab
token = convert_to_unicode(reader.readline())
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 179, in readline
return self._prepare_value(self._read_buf.ReadLineAsString())
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 98, in _prepare_value
return compat.as_str_any(val)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/compat.py", line 117, in as_str_any
return as_str(value)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/compat.py", line 87, in as_text
return bytes_or_text.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 8: invalid start byte
I'm running the code on OS X Catalina, Anaconda, Python 3.6
sentencepiece 0.1.83 pypi_0 pypi
tensorflow 1.14.0 mkl_py36h933f829_0
tensorflow-base 1.14.0 mkl_py36h655c25b_0
tensorflow-estimator 1.14.0 py_0
tensorflow-hub 0.6.0 pyhe1b5a44_0 conda-forge
same problem here. ignoring empty lines leads to second encoding error.
3 errors.
I solve that like
- Check token.strip().split() is empty
temp = token.strip().split()
if not len(temp)==0 :
token = temp[0]
if token not in vocab:
vocab[token] = len(vocab)
- encoding error
with tf.gfile.GFile(vocab_file, "rb") as reader:
- [CLS][SOS][EOS] token cannot found from vocab dict
def convert_by_vocab(vocab, items):
output = []
for item in items:
if item in vocab :
output.append(vocab[item])
else :
vocab[item] = len(item)
output.append(vocab[item])
return output
if dict hasn't that token just add to dict
@zealrant Thanks for your suggestion. I've done the same thing, but I don't feel like tokenizer works correctly anyway. For perfectly normal English string like "one more pseudo generalization and i'm giving up." (from COLA) it outputs mostly [UNK] tokens which doesn't make any sense to me.
INFO:tensorflow:*** Example ***
I1029 12:42:32.620660 4630658368 run_classifier_sp.py:479] *** Example ***
INFO:tensorflow:guid: train-3
I1029 12:42:32.625521 4630658368 run_classifier_sp.py:480] guid: train-3
INFO:tensorflow:tokens: [CLS] [UNK] [UNK] [UNK] [UNK] [UNK] , [UNK] [UNK] [UNK] [UNK] . [SEP]
I1029 12:42:32.636192 4630658368 run_classifier_sp.py:482] tokens: [CLS] [UNK] [UNK] [UNK] [UNK] [UNK] , [UNK] [UNK] [UNK] [UNK] . [SEP]
INFO:tensorflow:input_ids: 5 5 5 5 5 5 4984 5 5 5 5 1782 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I1029 12:42:32.645191 4630658368 run_classifier_sp.py:483] input_ids: 5 5 5 5 5 5 4984 5 5 5 5 1782 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I1029 12:42:32.648386 4630658368 run_classifier_sp.py:484] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I1029 12:42:32.650033 4630658368 run_classifier_sp.py:485] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:label: 1 (id = 1)
I1029 12:42:32.657105 4630658368 run_classifier_sp.py:486] label: 1 (id = 1)
It turned out that the file which the script tries to load as a vocabulary in fact is saved SentencePiece model. The change in the lines 159-161 did the trick:
return tokenization.FullTokenizer(
vocab_file=vocab_file, do_lower_case=do_lower_case,
spm_model_file=vocab_file)
vocab_file is ignored when spm_model_file is set and there is no way to pass null vocab_file so there is no issue.
I managed to get the train data loaded and preprocessed, but then the training crashes further with
ERROR:tensorflow:Error recorded from training_loop: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_23/layer_23/inner_group_0/LayerNorm_1/batchnorm/add_1' (op type: AddV2)
E1029 17:14:13.403403 4515265856 error_handling.py:70] Error recorded from training_loop: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_23/layer_23/inner_group_0/LayerNorm_1/batchnorm/add_1' (op type: AddV2)
INFO:tensorflow:training_loop marked as finished
I1029 17:14:13.404175 4515265856 error_handling.py:96] training_loop marked as finished
WARNING:tensorflow:Reraising captured error
W1029 17:14:13.404798 4515265856 error_handling.py:130] Reraising captured error
Traceback (most recent call last):
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py", line 673, in _GradientsHelper
grad_fn = ops.get_gradient_function(op)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2775, in get_gradient_function
return _gradient_registry.lookup(op_type)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/registry.py", line 97, in lookup
"%s registry has no entry for: %s" % (self._name, name))
LookupError: gradient registry has no entry for: AddV2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/vladimirbugay/.vscode-insiders/extensions/ms-python.python-2019.11.43735-dev/pythonFiles/ptvsd_launcher.py", line 43, in <module>
main(ptvsdArgs)
File "/Users/vladimirbugay/.vscode-insiders/extensions/ms-python.python-2019.11.43735-dev/pythonFiles/lib/python/old_ptvsd/ptvsd/__main__.py", line 432, in main
run()
File "/Users/vladimirbugay/.vscode-insiders/extensions/ms-python.python-2019.11.43735-dev/pythonFiles/lib/python/old_ptvsd/ptvsd/__main__.py", line 342, in run_module
run_module_as_main(target, alter_argv=True)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 320, in <module>
tf.app.run()
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 244, in main
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2876, in train
rendezvous.raise_errors()
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 131, in raise_errors
six.reraise(typ, value, traceback)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
saving_listeners=saving_listeners)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in _train_model_default
features, labels, ModeKeys.TRAIN, self.config)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2709, in _call_model_fn
config)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2967, in _model_fn
features, labels, is_export_mode=is_export_mode)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1549, in call_without_tpu
return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1867, in _call_model_fn
estimator_spec = self._model_fn(features=features, **kwargs)
File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 116, in model_fn
total_loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu)
File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/optimization.py", line 101, in create_optimizer
grads = tf.gradients(loss, tvars)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 158, in gradients
unconnected_gradients)
File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py", line 689, in _GradientsHelper
(op.name, op.type))
LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_23/layer_23/inner_group_0/LayerNorm_1/batchnorm/add_1' (op type: AddV2)
It turned out that the file which the script tries to load as a vocabulary in fact is saved SentencePiece model. The change in the lines 159-161 did the trick:
return tokenization.FullTokenizer( vocab_file=vocab_file, do_lower_case=do_lower_case, spm_model_file=vocab_file)vocab_file is ignored when spm_model_file is set and there is no way to pass null vocab_file so there is no issue.
I managed to get the train data loaded and preprocessed, but then the training crashes further with
ERROR:tensorflow:Error recorded from training_loop: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_23/layer_23/inner_group_0/LayerNorm_1/batchnorm/add_1' (op type: AddV2) E1029 17:14:13.403403 4515265856 error_handling.py:70] Error recorded from training_loop: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_23/layer_23/inner_group_0/LayerNorm_1/batchnorm/add_1' (op type: AddV2) INFO:tensorflow:training_loop marked as finished I1029 17:14:13.404175 4515265856 error_handling.py:96] training_loop marked as finished WARNING:tensorflow:Reraising captured error W1029 17:14:13.404798 4515265856 error_handling.py:130] Reraising captured error Traceback (most recent call last): File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py", line 673, in _GradientsHelper grad_fn = ops.get_gradient_function(op) File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2775, in get_gradient_function return _gradient_registry.lookup(op_type) File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/registry.py", line 97, in lookup "%s registry has no entry for: %s" % (self._name, name)) LookupError: gradient registry has no entry for: AddV2 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/vladimirbugay/.vscode-insiders/extensions/ms-python.python-2019.11.43735-dev/pythonFiles/ptvsd_launcher.py", line 43, in <module> main(ptvsdArgs) File "/Users/vladimirbugay/.vscode-insiders/extensions/ms-python.python-2019.11.43735-dev/pythonFiles/lib/python/old_ptvsd/ptvsd/__main__.py", line 432, in main run() File "/Users/vladimirbugay/.vscode-insiders/extensions/ms-python.python-2019.11.43735-dev/pythonFiles/lib/python/old_ptvsd/ptvsd/__main__.py", line 342, in run_module run_module_as_main(target, alter_argv=True) File "/Users/vladimirbugay/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/Users/vladimirbugay/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 320, in <module> tf.app.run() File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/absl/app.py", line 299, in run _run_main(main, args) File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main sys.exit(main(argv)) File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 244, in main estimator.train(input_fn=train_input_fn, max_steps=num_train_steps) File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2876, in train rendezvous.raise_errors() File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 131, in raise_errors six.reraise(typ, value, traceback) File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/six.py", line 693, in reraise raise value File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train saving_listeners=saving_listeners) File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in _train_model_default features, labels, ModeKeys.TRAIN, self.config) File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2709, in _call_model_fn config) File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2967, in _model_fn features, labels, is_export_mode=is_export_mode) File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1549, in call_without_tpu return self._call_model_fn(features, labels, is_export_mode=is_export_mode) File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1867, in _call_model_fn estimator_spec = self._model_fn(features=features, **kwargs) File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 116, in model_fn total_loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu) File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/optimization.py", line 101, in create_optimizer grads = tf.gradients(loss, tvars) File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 158, in gradients unconnected_gradients) File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py", line 689, in _GradientsHelper (op.name, op.type)) LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_23/layer_23/inner_group_0/LayerNorm_1/batchnorm/add_1' (op type: AddV2)
I got exactly same error. Did you figure out why?