albert icon indicating copy to clipboard operation
albert copied to clipboard

[ALBERT] Tokenization crashes while trying to finetune classifier with TF Hub model

Open vbougay opened this issue 6 years ago • 6 comments

I'm trying to get ALBERT running locally with the following command line: python -m albert.run_classifier_with_tfhub --task_name=MNLI --data_dir=./multinli_1.0 --albert_hub_module_handle=https://tfhub.dev/google/albert_large/1 --output_dir=./output --do_train=True

When tokenizer is initialized from TF Hub model it crashes:

Traceback (most recent call last):
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 320, in <module>
    tf.app.run()
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 187, in main
    tokenizer = create_tokenizer_from_hub_module(FLAGS.albert_hub_module_handle)
  File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 161, in create_tokenizer_from_hub_module
    spm_model_file=FLAGS.spm_model_file)
  File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/tokenization.py", line 247, in __init__
    self.vocab = load_vocab(vocab_file)
  File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/tokenization.py", line 201, in load_vocab
    token = token.strip().split()[0]
IndexError: list index out of range

The issue is with the line being just a newline character '\n'. However, even if I modify code to ignore them it still crashes later with

Traceback (most recent call last):
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 320, in <module>
    tf.app.run()
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 187, in main
    tokenizer = create_tokenizer_from_hub_module(FLAGS.albert_hub_module_handle)
  File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 161, in create_tokenizer_from_hub_module
    spm_model_file=FLAGS.spm_model_file)
  File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/tokenization.py", line 249, in __init__
    self.vocab = load_vocab(vocab_file)
  File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/tokenization.py", line 198, in load_vocab
    token = convert_to_unicode(reader.readline())
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 179, in readline
    return self._prepare_value(self._read_buf.ReadLineAsString())
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 98, in _prepare_value
    return compat.as_str_any(val)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/compat.py", line 117, in as_str_any
    return as_str(value)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/compat.py", line 87, in as_text
    return bytes_or_text.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 8: invalid start byte

I'm running the code on OS X Catalina, Anaconda, Python 3.6

sentencepiece             0.1.83                   pypi_0    pypi
tensorflow                1.14.0          mkl_py36h933f829_0  
tensorflow-base           1.14.0          mkl_py36h655c25b_0  
tensorflow-estimator      1.14.0                     py_0  
tensorflow-hub            0.6.0              pyhe1b5a44_0    conda-forge

vbougay avatar Oct 28 '19 08:10 vbougay

same problem here. ignoring empty lines leads to second encoding error.

donggyukimc avatar Oct 28 '19 13:10 donggyukimc

3 errors.

I solve that like

  1. Check token.strip().split() is empty
temp = token.strip().split()
      if not len(temp)==0 :
        token = temp[0]
        if token not in vocab:
          vocab[token] = len(vocab)
  1. encoding error

with tf.gfile.GFile(vocab_file, "rb") as reader:

  1. [CLS][SOS][EOS] token cannot found from vocab dict
def convert_by_vocab(vocab, items):
  output = []
  for item in items:
      if item in vocab :
        output.append(vocab[item])
      else :
        vocab[item] = len(item)
        output.append(vocab[item])
  return output

if dict hasn't that token just add to dict

zealrant avatar Oct 29 '19 01:10 zealrant

@zealrant Thanks for your suggestion. I've done the same thing, but I don't feel like tokenizer works correctly anyway. For perfectly normal English string like "one more pseudo generalization and i'm giving up." (from COLA) it outputs mostly [UNK] tokens which doesn't make any sense to me.

INFO:tensorflow:*** Example ***
I1029 12:42:32.620660 4630658368 run_classifier_sp.py:479] *** Example ***
INFO:tensorflow:guid: train-3
I1029 12:42:32.625521 4630658368 run_classifier_sp.py:480] guid: train-3
INFO:tensorflow:tokens: [CLS] [UNK] [UNK] [UNK] [UNK] [UNK] , [UNK] [UNK] [UNK] [UNK] . [SEP]
I1029 12:42:32.636192 4630658368 run_classifier_sp.py:482] tokens: [CLS] [UNK] [UNK] [UNK] [UNK] [UNK] , [UNK] [UNK] [UNK] [UNK] . [SEP]
INFO:tensorflow:input_ids: 5 5 5 5 5 5 4984 5 5 5 5 1782 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I1029 12:42:32.645191 4630658368 run_classifier_sp.py:483] input_ids: 5 5 5 5 5 5 4984 5 5 5 5 1782 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I1029 12:42:32.648386 4630658368 run_classifier_sp.py:484] input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I1029 12:42:32.650033 4630658368 run_classifier_sp.py:485] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:label: 1 (id = 1)
I1029 12:42:32.657105 4630658368 run_classifier_sp.py:486] label: 1 (id = 1)

vbougay avatar Oct 29 '19 07:10 vbougay

It turned out that the file which the script tries to load as a vocabulary in fact is saved SentencePiece model. The change in the lines 159-161 did the trick:

  return tokenization.FullTokenizer(
      vocab_file=vocab_file, do_lower_case=do_lower_case,
      spm_model_file=vocab_file)

vocab_file is ignored when spm_model_file is set and there is no way to pass null vocab_file so there is no issue.

I managed to get the train data loaded and preprocessed, but then the training crashes further with

ERROR:tensorflow:Error recorded from training_loop: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_23/layer_23/inner_group_0/LayerNorm_1/batchnorm/add_1' (op type: AddV2)
E1029 17:14:13.403403 4515265856 error_handling.py:70] Error recorded from training_loop: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_23/layer_23/inner_group_0/LayerNorm_1/batchnorm/add_1' (op type: AddV2)
INFO:tensorflow:training_loop marked as finished
I1029 17:14:13.404175 4515265856 error_handling.py:96] training_loop marked as finished
WARNING:tensorflow:Reraising captured error
W1029 17:14:13.404798 4515265856 error_handling.py:130] Reraising captured error
Traceback (most recent call last):
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py", line 673, in _GradientsHelper
    grad_fn = ops.get_gradient_function(op)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2775, in get_gradient_function
    return _gradient_registry.lookup(op_type)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/registry.py", line 97, in lookup
    "%s registry has no entry for: %s" % (self._name, name))
LookupError: gradient registry has no entry for: AddV2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/vladimirbugay/.vscode-insiders/extensions/ms-python.python-2019.11.43735-dev/pythonFiles/ptvsd_launcher.py", line 43, in <module>
    main(ptvsdArgs)
  File "/Users/vladimirbugay/.vscode-insiders/extensions/ms-python.python-2019.11.43735-dev/pythonFiles/lib/python/old_ptvsd/ptvsd/__main__.py", line 432, in main
    run()
  File "/Users/vladimirbugay/.vscode-insiders/extensions/ms-python.python-2019.11.43735-dev/pythonFiles/lib/python/old_ptvsd/ptvsd/__main__.py", line 342, in run_module
    run_module_as_main(target, alter_argv=True)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 320, in <module>
    tf.app.run()
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 244, in main
    estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2876, in train
    rendezvous.raise_errors()
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 131, in raise_errors
    six.reraise(typ, value, traceback)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
    saving_listeners=saving_listeners)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in _train_model_default
    features, labels, ModeKeys.TRAIN, self.config)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2709, in _call_model_fn
    config)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2967, in _model_fn
    features, labels, is_export_mode=is_export_mode)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1549, in call_without_tpu
    return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1867, in _call_model_fn
    estimator_spec = self._model_fn(features=features, **kwargs)
  File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 116, in model_fn
    total_loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu)
  File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/optimization.py", line 101, in create_optimizer
    grads = tf.gradients(loss, tvars)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 158, in gradients
    unconnected_gradients)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py", line 689, in _GradientsHelper
    (op.name, op.type))
LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_23/layer_23/inner_group_0/LayerNorm_1/batchnorm/add_1' (op type: AddV2)

vbougay avatar Oct 29 '19 15:10 vbougay

It turned out that the file which the script tries to load as a vocabulary in fact is saved SentencePiece model. The change in the lines 159-161 did the trick:

  return tokenization.FullTokenizer(
      vocab_file=vocab_file, do_lower_case=do_lower_case,
      spm_model_file=vocab_file)

vocab_file is ignored when spm_model_file is set and there is no way to pass null vocab_file so there is no issue.

I managed to get the train data loaded and preprocessed, but then the training crashes further with

ERROR:tensorflow:Error recorded from training_loop: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_23/layer_23/inner_group_0/LayerNorm_1/batchnorm/add_1' (op type: AddV2)
E1029 17:14:13.403403 4515265856 error_handling.py:70] Error recorded from training_loop: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_23/layer_23/inner_group_0/LayerNorm_1/batchnorm/add_1' (op type: AddV2)
INFO:tensorflow:training_loop marked as finished
I1029 17:14:13.404175 4515265856 error_handling.py:96] training_loop marked as finished
WARNING:tensorflow:Reraising captured error
W1029 17:14:13.404798 4515265856 error_handling.py:130] Reraising captured error
Traceback (most recent call last):
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py", line 673, in _GradientsHelper
    grad_fn = ops.get_gradient_function(op)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2775, in get_gradient_function
    return _gradient_registry.lookup(op_type)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/registry.py", line 97, in lookup
    "%s registry has no entry for: %s" % (self._name, name))
LookupError: gradient registry has no entry for: AddV2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/vladimirbugay/.vscode-insiders/extensions/ms-python.python-2019.11.43735-dev/pythonFiles/ptvsd_launcher.py", line 43, in <module>
    main(ptvsdArgs)
  File "/Users/vladimirbugay/.vscode-insiders/extensions/ms-python.python-2019.11.43735-dev/pythonFiles/lib/python/old_ptvsd/ptvsd/__main__.py", line 432, in main
    run()
  File "/Users/vladimirbugay/.vscode-insiders/extensions/ms-python.python-2019.11.43735-dev/pythonFiles/lib/python/old_ptvsd/ptvsd/__main__.py", line 342, in run_module
    run_module_as_main(target, alter_argv=True)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 320, in <module>
    tf.app.run()
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 244, in main
    estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2876, in train
    rendezvous.raise_errors()
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 131, in raise_errors
    six.reraise(typ, value, traceback)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
    saving_listeners=saving_listeners)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in _train_model_default
    features, labels, ModeKeys.TRAIN, self.config)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2709, in _call_model_fn
    config)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2967, in _model_fn
    features, labels, is_export_mode=is_export_mode)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1549, in call_without_tpu
    return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1867, in _call_model_fn
    estimator_spec = self._model_fn(features=features, **kwargs)
  File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/run_classifier_with_tfhub.py", line 116, in model_fn
    total_loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu)
  File "/Users/vladimirbugay/Knoema/GitHub/google-research/albert/optimization.py", line 101, in create_optimizer
    grads = tf.gradients(loss, tvars)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 158, in gradients
    unconnected_gradients)
  File "/Users/vladimirbugay/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py", line 689, in _GradientsHelper
    (op.name, op.type))
LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_23/layer_23/inner_group_0/LayerNorm_1/batchnorm/add_1' (op type: AddV2)

I got exactly same error. Did you figure out why?

wxp16 avatar Oct 30 '19 14:10 wxp16

I got exactly same error. Did you figure out why?

Take a look at this issue

PavelKovalets avatar Nov 02 '19 09:11 PavelKovalets