Failed loading model: Could not find key /_1 in the model file
@OfirArviv, this is an issue we've been talking about a bit. I guess it could be called a DyNet issue but I'll try working around it.
In the mrp branch, whenever I train a model without BERT, when I try to load it I get this error:
...
[dynet] 2.1
Loading from 'test_files/models/ucca.enum'... Done (0.000s).
Loading model from 'test_files/models/ucca': 0%| | 0/13 [00:00<?, ?param/s]Traceback (most recent call last):
File "tupa/tupa/model.py", line 234, in load
self.classifier.load(self.filename)
File "tupa/tupa/classifiers/classifier.py", line 125, in load
self.load_model(filename, d)
File "tupa/tupa/classifiers/nn/neural_network.py", line 474, in load_model
values = self.load_param_values(filename, d)
File "tupa/classifiers/nn/neural_network.py", line 503, in load_param_values
desc="Loading model from '%s'" % filename, unit="param"))
File "tupa/lib/python3.7/site-packages/tqdm/_tqdm.py", line 1005, in __iter__
for obj in iterable:
File "_dynet.pyx", line 450, in load_generator
File "_dynet.pyx", line 453, in _dynet.load_generator
File "_dynet.pyx", line 327, in _dynet._load_one
File "_dynet.pyx", line 1482, in _dynet.ParameterCollection.load_lookup_param
File "_dynet.pyx", line 1497, in _dynet.ParameterCollection.load_lookup_param
RuntimeError: Could not find key /_1 in the model file
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "tupa/tupa/parse.py", line 653, in <module>
main()
File "tupa/tupa/parse.py", line 649, in main
list(main_generator())
File "tupa/tupa/parse.py", line 631, in main_generator
yield from train_test(test=args.input, args=args)
File "tupa/tupa/parse.py", line 557, in train_test
yield from filter(None, parser.train(train, dev=dev, test=test is not None, iterations=args.iterations))
File "tupa/tupa/parse.py", line 457, in train
self.model.load()
File "tupa/tupa/model.py", line 244, in load
raise IOError("Failed loading model from '%s'" % self.filename) from e
OSError: Failed loading model from 'test_files/models/ucca'
Looking at the model .data file, I couldn't find any problem. There is certainly a line starting with #LookupParameter# /_1 there.
However, looking at the training log, I found it that all updates resulted in this error:
Error in update(): Magnitude of gradient is bad: -nan
I could then reproduce the problem by adding (loss/0).backward() before self.trainer.update() in https://github.com/danielhers/tupa/blob/20e7b12e3c91a5839c73db8433f0968fbd965bc6/tupa/classifiers/nn/neural_network.py#L422
So now the question is just why all updates result in -nan gradients in the normal code.