sqlite_scanner icon indicating copy to clipboard operation
sqlite_scanner copied to clipboard

A question about performance

Open hs41-18 opened this issue 2 years ago • 1 comments

Hi. Many thanks for all the work you are doing.

I have a question about getting data from SQLite into Duckdb. In SQLite, I have tables/views with a massive amount of rows 10 billion, transactional data. What is the easiest way to copy those tables into Duckdb to perform some analytics?

hs41-18 avatar Nov 25 '23 01:11 hs41-18

My experience here;

I am using a python arrow UDF to process results from a query against the sqlite3 source. I'm seeing many small vectors of 1 or 2 records going through (as opposed to the 2048 you would expect at full capacity). I have an expensive setup in my UDF so it kills performance.

I wonder if this behavior would match with internal duckdb operations.

I get larger vectors when accessing the sqlite db in order (wrt the rowid), so maybe a copy wouldn't suffer from the performace penalty

Is there a way to control the vector size here?

Nintorac avatar Jan 05 '24 13:01 Nintorac