RuntimeError (Sizes of tensors must match) when training on 'WikiSQL'
Hi,
I followed the steps to train on Spider & WikiSQL using a Tesla M40 (24GB memory) using 'train_batch_size=4' (No other changes are made to the model configuration):
# wikisql-bridge-bert-large.sh
num_steps=30000
curriculum_interval=0
num_peek_steps=400
num_accumulation_steps=3
save_best_model_only="True"
train_batch_size=4 # from 16 to 4
It works well on Spider dataset, but when comes to WikiSQL , I experienced the following error:
--------------------------
wandb: Tracking run with wandb version 0.8.30
wandb: Wandb version 0.10.21 is available! To upgrade, please run:
wandb: $ pip install wandb --upgrade
wandb: Run data is saved locally in wandb/run-20210303_163242-23o2pmxp
wandb: Syncing run wikisql.bridge.lstm.meta.ts.ppl-0.85.2.dn.no_from.feat.bert-large-uncased.xavier-1024-512-512-4-3-0.0003-inv-sqr-0.0003-3000-5e-05-inv-sqr-0.0-3000-0.3-0.3-0.0-0.0-1-8-0.1-0.0-res-0.2-0.0-ff-0.4-0.0.210304-003242.scz2
wandb: ⭐️ View project at https://app.wandb.ai/zjy/smore-wikisql-group--final
wandb: 🚀 View run at https://app.wandb.ai/zjy/smore-wikisql-group--final/runs/23o2pmxp
wandb: Run `wandb off` to turn off syncing.
2%|█▉ | 19/1200 [00:08<08:29, 2.32it/s]
Traceback (most recent call last):
File "/data/users/zjy/anaconda3/envs/bridge/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/data/users/zjy/anaconda3/envs/bridge/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/data/users/zjy/TabularSemanticParsing/src/experiments.py", line 407, in <module>
run_experiment(args)
File "/data/users/zjy/TabularSemanticParsing/src/experiments.py", line 392, in run_experiment
train(sp)
File "/data/users/zjy/TabularSemanticParsing/src/experiments.py", line 63, in train
sp.run_train(train_data, dev_data)
File "/data/users/zjy/TabularSemanticParsing/src/common/learn_framework.py", line 208, in run_train
loss = self.loss(formatted_batch)
File "/data/users/zjy/TabularSemanticParsing/src/semantic_parser/learn_framework.py", line 94, in loss
outputs = self.forward(formatted_batch)
File "/data/users/zjy/TabularSemanticParsing/src/semantic_parser/learn_framework.py", line 129, in forward
decoder_ptr_value_ids=decoder_ptr_value_ids)
File "/data/users/zjy/anaconda3/envs/bridge/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/users/zjy/TabularSemanticParsing/src/semantic_parser/bridge.py", line 59, in forward
transformer_output_value_masks)
File "/data/users/zjy/anaconda3/envs/bridge/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/users/zjy/TabularSemanticParsing/src/semantic_parser/bridge.py", line 263, in forward
schema_hiddens = self.schema_encoder(schema_hiddens, feature_ids)
File "/data/users/zjy/anaconda3/envs/bridge/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/users/zjy/TabularSemanticParsing/src/semantic_parser/bridge.py", line 169, in forward
field_type_embeddings], dim=2))
RuntimeError: Sizes of tensors must match except in dimension 1. Got 9 and 11 (The offending index is 0)
wandb: Waiting for W&B process to finish, PID 957961
wandb: Program failed with code 1. Press ctrl-c to abort syncing.
wandb: Run summary:
wandb: _runtime 83.59444427490234
wandb: learning_rate/wikisql 0.0003
wandb: _step 1
wandb: _timestamp 1614789212.7551596
wandb: fine_tuning_rate/wikisql 1.6666666666666667e-08
wandb: Syncing files in wandb/run-20210303_163242-23o2pmxp:
wandb: code/src/experiments.py
wandb: plus 8 W&B file(s) and 1 media file(s)
wandb:
wandb: Synced wikisql.bridge.lstm.meta.ts.ppl-0.85.2.dn.no_from.feat.bert-large-uncased.xavier-1024-512-512-4-3-0.0003-inv-sqr-0.0003-3000-5e-05-inv-sqr-0.0-3000-0.3-0.3-0.0-0.0-1-8-0.1-0.0-res-0.2-0.0-ff-0.4-0.0.210304-003242.scz2: https://app.wandb.ai/zjy/smore-wikisql-group--final/runs/23o2pmxp
I also tried train_batch_size of 2 , but still Useless , same error occured when switch to GeForce GTX Titan Xp (12GB) or Tesla K80 (11GB). Any suggestion on the reason of this or what I can try to get rid of it ? Thank you !
I get the same error too. Haven't made any changes to the configuration.
Sorry I introduced this bug with the checkpoints release.
A temporary fix for Spider is to comment out line https://github.com/salesforce/TabularSemanticParsing/blob/main/src/utils/trans/bert_utils.py#L31 and uncomment line https://github.com/salesforce/TabularSemanticParsing/blob/main/src/utils/trans/bert_utils.py#L30.
The issues is that in the released pre-trained checkpoints we used "*" in the hybrid sequence as the wildcard representation (our implementation treats wildcard as a special column in the database), but WikiSQL data is noisy and some text in the dataset contains "*", which causes the model to mis-estimate number of columns in the database.
I will push a more stable fix later.