SQLNet Results not as good as in paper

Hi Xiaojun,

I trained the model without changing any hyperparameter's value. (python train.py --ca)

When executing the test.py, I obtain the following accuracy scores:

Dev acc_qm: 0.584253651585;
  breakdown on (agg, sel, where): [0.90048688 0.91307446 0.68459803]
Dev execution acc: 0.654435340221
Test acc_qm: 0.571671495151;
  breakdown on (agg, sel, where): [0.90212873 0.90370324 0.67092833]
Test execution acc: 0.641768484696

These results are several points below the ones reported in your paper.

Although you do not report Acc_qm and Acc_ex for your model when the word embedding isn't allowed to train, you mention in section 4.3 that the improvement is about 2 points when training the word_embedding. After subtracting these 2 points to the results reported on table 1, there still is a 2-3 points difference between my results and yours.

My question is: Are the results reported in the paper the best ones you obtained after running the whole training procedure multiple time? In that case, were the results obtained on average closer to mines or yours ? How many times did you run the training procedure to obtain those results ?

Thanks, Thomas

Feb 07 '18 16:02 ThomasLecat

Hi, in the paper I only ran two times and take the better parts of the model. It's quite surprising to see that your accuracy is 2-3 points lower. Comparing the breakdown result with Table 2 in the paper, you can see that it's the 'WHERE clause' part that performs not well. During training, I also experienced that the WHERE part will sometimes converge to local minima and the accuracy is low. Usually, you can train it another time and the performance will be better.

Feb 12 '18 03:02 xiaojunxu

The SQLNET paper mentions that training was done for 200 epoch. But in train.py, I find the default number of epochs is set to 100. Could that be the reason for lower accuracy that Thomas finds?

May 05 '18 13:05 armanabraham

Hi Xiaojun,

I trained the model without changing any hyperparameter's value. (python train.py --ca)

When executing the test.py, I obtain the following accuracy scores:
Dev acc_qm: 0.584253651585;
  breakdown on (agg, sel, where): [0.90048688 0.91307446 0.68459803]
Dev execution acc: 0.654435340221
Test acc_qm: 0.571671495151;
  breakdown on (agg, sel, where): [0.90212873 0.90370324 0.67092833]
Test execution acc: 0.641768484696
These results are several points below the ones reported in your paper.

Although you do not report Acc_qm and Acc_ex for your model when the word embedding isn't allowed to train, you mention in section 4.3 that the improvement is about 2 points when training the word_embedding. After subtracting these 2 points to the results reported on table 1, there still is a 2-3 points difference between my results and yours.

My question is: Are the results reported in the paper the best ones you obtained after running the whole training procedure multiple time? In that case, were the results obtained on average closer to mines or yours ? How many times did you run the training procedure to obtain those results ?

Thanks, Thomas

hey thomas can you help me in executing this code. i am trying to execute train.py --ca but my pc got hangs after aroung 20-25 minutes. on my pc i have 32gb ram ,NVIDIA 2080 gpu,i7 9th gen. what can be possible reasons for this issue.

Feb 06 '20 05:02 jaydeepb-inexture