FTS5 (full-text search) support
FTS5 is an SQLite virtual table module that provides full-text search functionality to database applications.
Is there a way to use it with sqlite-xsv? I couldn't find a way to use using csv with using fts5.
If not can some support be added?
I currently do:
.load ./xsv0
select xsv_version();
create virtual table temp.students using csv(filename="students.csv");
CREATE VIRTUAL TABLE temp.studentsfts5 USING fts5(column1, column2, column3, column4);
INSERT INTO temp.studentsfts5 SELECT * FROM temp.students;
which for my dataset(2million records, 22 columns) takes a 40 seconds while xsv only took 1 second.
Just trying to see if I can leverage this amazingly fast library.
That approach (seperate csv and fts5 virtual tables) is correct, there's no way to "combine" the two in a single table. Reading a 2 million row CSV in ~1 second with sqlite-xsv sounds about right. I think the bulk of that extra 40 seconds is coming from creating the FTS5 table.
One approach to confirm this: insert the CSV data into another table first.
.timer on
.load ./xsv0
create virtual table temp.csv_students using csv(filename="students.csv");
create table students as select * from temp.csv_students;
create virtual table fts_students using fts5(col1, col2, ...);
insert into fts_students select * from students;
The .timer on here is key - it'll show how long each step will take. My guess is that the create table students as ... should take ~1-3 seconds, since it's directly reading the CSV into a regular table with sqlite-xsv. My guess is that the last line, insert into fts_students, will take a long time (>30 seconds), because it's building the FTS5 index from scratch.
Let me know how it goes!
That approach (seperate csv and fts5 virtual tables) is correct, there's no way to "combine" the two in a single table. Reading a 2 million row CSV in ~1 second with sqlite-xsv sounds about right. I think the bulk of that extra 40 seconds is coming from creating the FTS5 table.
You are correct. I had confirmed it already that was the insert command that was taking time. Sorry about not making it clear earlier.
I was trying to find if there was a fast alternative way to do it.