SQLNet
SQLNet copied to clipboard
Tokenization script
Hi @xiaojunxu
Could you upload your tokenizatoin script?
The reason is that I found there are some difference in "question" and "query_tok" sometimes.
For example, at 25th data in dev_tok.jsonl,
- "question": "What is the district when the total amount of trees is smaller than 150817.6878461314 and amount of old trees is 1,928 (1.89%)?",
- However, in "query_tok": ["SELECT", "district", "WHERE", "total", "amount", "of", "trees", "LT", "150817.687846", "AND", "amount", "of", "old", "trees", "EQL", "1,928", "(", "1.89", "%", ")"],
You can see that float number is different somehow. So, if possible, I would like to modify the tokenization script.
Thanks!