Table-Pretraining icon indicating copy to clipboard operation
Table-Pretraining copied to clipboard

Tapex performance on large tables

Open srewai opened this issue 3 years ago • 1 comments

Hello team, I've csv files with more than 30K rows and 10 columns. I am in need of your help to understand what is the best way to approach this problem since the table is too large. Is Tapex able to handle large tables or there are work around that I can adapt?

The first thought that comes into my mind is splitting the huge table into multiple sub-tables(meaning column names will remain the same) and then perform inference on it. The predictions further can be aggregated for final results. Could you please share your expertise on this problem. In general, how are the large tableQA performed, please suggest.

Additionally, can we list/view the corresponding SQL queries of predicted answers?

srewai avatar Oct 06 '22 12:10 srewai

@srewai Hi! Thanks for the very interesting question (Yes it is quite challenging). I think the best practise may be using the text-to-SQL parser instead of current text-to-answer models on table/database. Similar to previous works (e.g., TaPaS), TAPEX cannot accommodate large tables since it requires the input table to be linearized as the input for the model. But it is not the case for current text-to-SQL parsers. Therefore, I would recommend you to use UniSAR as the backbone to tackle your problem. Hope that helps!

Best, Qian

SivilTaram avatar Oct 07 '22 14:10 SivilTaram

@SivilTaram , I genuinely appreciate your quick and "right direction approach" help here:). I was also having look at the paper hybrid ranking by microsoft . This seems promising too. Just a thought if you've any experience with this paper:).

srewai avatar Oct 10 '22 10:10 srewai

@srewai Not yet maybe because it is for WikiSQL haha. But I think it is worthy to have a try on that! Also, I think dense retrieval may be one kind of potential solution :)

SivilTaram avatar Oct 11 '22 08:10 SivilTaram

Closed since no more activity. Feel free to re-open it :)

SivilTaram avatar Oct 18 '22 15:10 SivilTaram

@SivilTaram sorry , i am still infected with covid:/. I had look at the UniSAR paper, looks promising with impressive results. Is this SOTA for text-to-SQL as of now? Or are there any other papers as well that you would suggest. We want to make enterprise level text-to-SQL system. -You're amazing, thanks for your help:)

srewai avatar Oct 18 '22 16:10 srewai