TabularSemanticParsing RuntimeError (Sizes of tensors must match) when training on 'WikiSQL'

Hi,

I followed the steps to train on Spider & WikiSQL using a Tesla M40 (24GB memory) using 'train_batch_size=4' (No other changes are made to the model configuration):

# wikisql-bridge-bert-large.sh
num_steps=30000
curriculum_interval=0
num_peek_steps=400
num_accumulation_steps=3
save_best_model_only="True"
train_batch_size=4  # from 16 to 4

It works well on Spider dataset， but when comes to WikiSQL , I experienced the following error:

--------------------------

wandb: Tracking run with wandb version 0.8.30
wandb: Wandb version 0.10.21 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
wandb: Run data is saved locally in wandb/run-20210303_163242-23o2pmxp
wandb: Syncing run wikisql.bridge.lstm.meta.ts.ppl-0.85.2.dn.no_from.feat.bert-large-uncased.xavier-1024-512-512-4-3-0.0003-inv-sqr-0.0003-3000-5e-05-inv-sqr-0.0-3000-0.3-0.3-0.0-0.0-1-8-0.1-0.0-res-0.2-0.0-ff-0.4-0.0.210304-003242.scz2
wandb: ⭐️ View project at https://app.wandb.ai/zjy/smore-wikisql-group--final
wandb: 🚀 View run at https://app.wandb.ai/zjy/smore-wikisql-group--final/runs/23o2pmxp
wandb: Run `wandb off` to turn off syncing.

  2%|█▉                                                              | 19/1200 [00:08<08:29,  2.32it/s]
Traceback (most recent call last):
  File "/data/users/zjy/anaconda3/envs/bridge/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/data/users/zjy/anaconda3/envs/bridge/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/data/users/zjy/TabularSemanticParsing/src/experiments.py", line 407, in <module>
    run_experiment(args)
  File "/data/users/zjy/TabularSemanticParsing/src/experiments.py", line 392, in run_experiment
    train(sp)
  File "/data/users/zjy/TabularSemanticParsing/src/experiments.py", line 63, in train
    sp.run_train(train_data, dev_data)
  File "/data/users/zjy/TabularSemanticParsing/src/common/learn_framework.py", line 208, in run_train
    loss = self.loss(formatted_batch)
  File "/data/users/zjy/TabularSemanticParsing/src/semantic_parser/learn_framework.py", line 94, in loss
    outputs = self.forward(formatted_batch)
  File "/data/users/zjy/TabularSemanticParsing/src/semantic_parser/learn_framework.py", line 129, in forward
    decoder_ptr_value_ids=decoder_ptr_value_ids)
  File "/data/users/zjy/anaconda3/envs/bridge/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/users/zjy/TabularSemanticParsing/src/semantic_parser/bridge.py", line 59, in forward
    transformer_output_value_masks)
  File "/data/users/zjy/anaconda3/envs/bridge/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/users/zjy/TabularSemanticParsing/src/semantic_parser/bridge.py", line 263, in forward
    schema_hiddens = self.schema_encoder(schema_hiddens, feature_ids)
  File "/data/users/zjy/anaconda3/envs/bridge/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/users/zjy/TabularSemanticParsing/src/semantic_parser/bridge.py", line 169, in forward
    field_type_embeddings], dim=2))
RuntimeError: Sizes of tensors must match except in dimension 1. Got 9 and 11 (The offending index is 0)

wandb: Waiting for W&B process to finish, PID 957961
wandb: Program failed with code 1. Press ctrl-c to abort syncing.
wandb: Run summary:
wandb:                   _runtime 83.59444427490234
wandb:      learning_rate/wikisql 0.0003
wandb:                      _step 1
wandb:                 _timestamp 1614789212.7551596
wandb:   fine_tuning_rate/wikisql 1.6666666666666667e-08
wandb: Syncing files in wandb/run-20210303_163242-23o2pmxp:
wandb:   code/src/experiments.py
wandb: plus 8 W&B file(s) and 1 media file(s)
wandb:                                                                                
wandb: Synced wikisql.bridge.lstm.meta.ts.ppl-0.85.2.dn.no_from.feat.bert-large-uncased.xavier-1024-512-512-4-3-0.0003-inv-sqr-0.0003-3000-5e-05-inv-sqr-0.0-3000-0.3-0.3-0.0-0.0-1-8-0.1-0.0-res-0.2-0.0-ff-0.4-0.0.210304-003242.scz2: https://app.wandb.ai/zjy/smore-wikisql-group--final/runs/23o2pmxp

I also tried train_batch_size of 2 , but still Useless , same error occured when switch to GeForce GTX Titan Xp (12GB) or Tesla K80 (11GB). Any suggestion on the reason of this or what I can try to get rid of it ? Thank you !

Mar 03 '21 16:03 zjyFrank

I get the same error too. Haven't made any changes to the configuration.

Mar 03 '21 22:03 thelyad

Sorry I introduced this bug with the checkpoints release.

A temporary fix for Spider is to comment out line https://github.com/salesforce/TabularSemanticParsing/blob/main/src/utils/trans/bert_utils.py#L31 and uncomment line https://github.com/salesforce/TabularSemanticParsing/blob/main/src/utils/trans/bert_utils.py#L30.

The issues is that in the released pre-trained checkpoints we used "*" in the hybrid sequence as the wildcard representation (our implementation treats wildcard as a special column in the database), but WikiSQL data is noisy and some text in the dataset contains "*", which causes the model to mis-estimate number of columns in the database.

Mar 03 '21 23:03 todpole3

I will push a more stable fix later.

Mar 03 '21 23:03 todpole3