feature: Add Remote (HTTP(S)) Support for SQLite Databases
Remote (HTTP(S)) Support for SQLite Databases
This PR adds support for querying remote SQLite databases over HTTP/HTTPS to the SQLite scanner extension, leveraging DuckDB's CachingFileSystem.
Features
- Query SQLite databases directly from HTTP/HTTPS URLs without downloading the entire file
- Seamless integration with
sqlite_scan()andATTACHsyntax - Adaptive read-ahead optimization (1MB-128MB) reduces network round trips
Implementation
-
SQLiteDuckDBCacheVFS: Custom SQLite VFS that delegates file I/O to DuckDB's CachingFileSystem
- Integrates with DuckDB's external file cache for efficient block caching
-
DuckDBCachedFile: Wrapper around DuckDB's CachingFileHandle with adaptive read-ahead
- Dynamically adjusts read-ahead size based on access patterns
-
Integration:
- Leverages DuckDB's httpfs extension for HTTP client functionality
Usage
-- Load required extensions
LOAD sqlite_scanner;
INSTALL httpfs;
LOAD httpfs;
-- Query remote SQLite database
SELECT * FROM sqlite_scan('https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite', 'Artist');
-- Attach as a database
ATTACH 'https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite' AS remote_db (TYPE sqlite);
SELECT * FROM remote_db.Album;
Related Issues: #39, #141
[Updated]
@Maxxen Thanks for the excellent suggestions! I've re-implemented based on your feedback.
Integrated with DuckDB's filesystem and caching infrastructure
- Now wraps DuckDB's FileSystem API for all remote file access (HTTP/HTTPS via httpfs)
- Uses DuckDB's CachingFileSystem which provides intelligent block caching
- Cache is managed by DuckDB's CachingFileSystem infrastructure
Simplified concurrency model
- Removed per-file mutexes - file operations are lock-free
- Only uses mutex for VFS registry operations (registration/unregistration)
- Relies on CachingFileSystem's internal synchronization for thread safety
Adaptive read-ahead optimization
- Implements dynamic read-ahead (1MB-128MB) to reduce network round trips
- Adjusts read size based on sequential vs random access patterns
- Works on top of CachingFileSystem's block caching
The implementation is much cleaner now - it properly delegates to DuckDB's existing infrastructure rather than reimplementing caching logic. Thanks again for the great foundation and suggestions!
This is very cool! Does it work with hosted SQLite solutions like Turso?
This is very cool! Does it work with hosted SQLite solutions like Turso?
Thank you @Alex-Monahan! I wasn't familiar with Turso, but it seems to implement a custom wire format rather than providing HTTP-like access to the stored file format, so regrettably, it likely wouldn't.
I've reorganized this PR into 5 logical commits to attempt to make the review process easier. Each commit is self-contained, compiles, and passes all tests:
Commit 1: Add ClientContext parameter to SQLiteDB::Open methods
- Minimal API change to support future extensibility
- Required for subsequent commits that need access to DuckDB's context
Commit 2: Lazy initialization when opening remote SQLite databases
- Implements lazy initialization to prevent deadlocks during remote file access
- Includes necessary validation logic for busy_timeout
Commit 3: Add SQLite VFS implementation for remote file support
- Core VFS implementation that integrates with DuckDB's CachingFileSystem
- Handles HTTP error mapping and adaptive read-ahead optimization
- No user-facing changes yet
Commit 4: Enable HTTP/HTTPS SQLite database access via sqlite_scan
- Activates HTTP/HTTPS support for sqlite_scan() function
- Includes 7 working tests demonstrating basic functionality
- ATTACH support intentionally deferred to next commit
Commit 5: Add HTTP ATTACH support and improve error handling
- Completes the implementation with ATTACH functionality
- Improves error handling throughout the PR
- Adds remaining tests for complex queries
Let me know if you'd like me to adjust the commit structure or if you have any questions about specific changes or feedback.
Curious if this is blocked by something? Would love to help get this merged if any additional contributions are necessary