sqlite-xsv icon indicating copy to clipboard operation
sqlite-xsv copied to clipboard

FTS5 (full-text search) support

Open Anutrix opened this issue 2 years ago • 2 comments

FTS5 is an SQLite virtual table module that provides full-text search functionality to database applications.

Is there a way to use it with sqlite-xsv? I couldn't find a way to use using csv with using fts5.

If not can some support be added?

I currently do:

.load ./xsv0
select xsv_version();
create virtual table temp.students using csv(filename="students.csv");
CREATE VIRTUAL TABLE temp.studentsfts5 USING fts5(column1, column2, column3, column4);
INSERT INTO temp.studentsfts5 SELECT * FROM temp.students;

which for my dataset(2million records, 22 columns) takes a 40 seconds while xsv only took 1 second.

Just trying to see if I can leverage this amazingly fast library.

Anutrix avatar Feb 06 '24 14:02 Anutrix

That approach (seperate csv and fts5 virtual tables) is correct, there's no way to "combine" the two in a single table. Reading a 2 million row CSV in ~1 second with sqlite-xsv sounds about right. I think the bulk of that extra 40 seconds is coming from creating the FTS5 table.

One approach to confirm this: insert the CSV data into another table first.

.timer on
.load ./xsv0
create virtual table temp.csv_students using csv(filename="students.csv");

create table students as select * from temp.csv_students;

create virtual table fts_students using fts5(col1, col2, ...);

insert into fts_students select * from students;

The .timer on here is key - it'll show how long each step will take. My guess is that the create table students as ... should take ~1-3 seconds, since it's directly reading the CSV into a regular table with sqlite-xsv. My guess is that the last line, insert into fts_students, will take a long time (>30 seconds), because it's building the FTS5 index from scratch.

Let me know how it goes!

asg017 avatar Feb 06 '24 16:02 asg017

That approach (seperate csv and fts5 virtual tables) is correct, there's no way to "combine" the two in a single table. Reading a 2 million row CSV in ~1 second with sqlite-xsv sounds about right. I think the bulk of that extra 40 seconds is coming from creating the FTS5 table.

You are correct. I had confirmed it already that was the insert command that was taking time. Sorry about not making it clear earlier.

I was trying to find if there was a fast alternative way to do it.

Anutrix avatar Feb 06 '24 16:02 Anutrix