SQLNet icon indicating copy to clipboard operation
SQLNet copied to clipboard

Tokenization script

Open whwang299 opened this issue 7 years ago • 0 comments

Hi @xiaojunxu

Could you upload your tokenizatoin script? The reason is that I found there are some difference in "question" and "query_tok" sometimes. For example, at 25th data in dev_tok.jsonl,

  • "question": "What is the district when the total amount of trees is smaller than 150817.6878461314 and amount of old trees is 1,928 (1.89%)?",
  • However, in "query_tok": ["SELECT", "district", "WHERE", "total", "amount", "of", "trees", "LT", "150817.687846", "AND", "amount", "of", "old", "trees", "EQL", "1,928", "(", "1.89", "%", ")"],

You can see that float number is different somehow. So, if possible, I would like to modify the tokenization script.

Thanks!

whwang299 avatar Jan 04 '19 09:01 whwang299