sqlx icon indicating copy to clipboard operation
sqlx copied to clipboard

[SQLite] Expose sqlite3_serialize and sqlite3_deserialize

Open mehcode opened this issue 5 years ago • 1 comments

https://www.sqlite.org/c3ref/serialize.html

https://www.sqlite.org/c3ref/deserialize.html

mehcode avatar Mar 30 '20 07:03 mehcode

My use case is following: Consider GPG encrypted SQLite file e.g. mydatabase.sqlite.gpg

I will open it with GPG and pass the bytestream to sqlite3_deserialize to open the database in-memory. After I'm done with modifications I will use sqlite3_serialize get the bytestream, and GPG encrypt it back to disk.

Without deserialize/serialize support one must first unencrypt the file to disk, and it's not very secure.

Ciantic avatar Jun 14 '24 10:06 Ciantic

This isn't entirely trivial because sqlite3_deserialize() has a major caveat we need to be aware of when wrapping it:

Applications must not modify the buffer P or invalidate it before the database connection D is closed.

We can't just pass a &[u8] without some sort of wrapper type tying the lifetime of the buffer to the lifetime of the connection, but that wrapper also can't just implement DerefMut<Target = SqliteConnection> because someone could mem::replace() that connection with a different one, leading to a use-after-free in safe code. It'd require a completely unique type, or adding a lifetime parameter to SqliteConnection which would be hugely inconvenient.

This means the simplest API needs to just take ownership of the passed-in buffer and ensure that it cannot be invalidated before the connection is closed. If we accept something like Vec<u8> or Box<[u8]>, we need to be very careful to ensure the correct drop order.

Alternatively, we could create a wrapper around a buffer allocated through SQLite and pass it to sqlite3_deserialize with SQLITE_DESERIALIZE_FREEONCLOSE, effectively giving SQLite ownership of it. Then we don't have to worry how it's freed. This is also required to set SQLITE_DESERIALIZE_RESIZABLE so that it's actually possible for the database to grow during use. The hardest part of this would be creating a useful enough API around it, on-par with Vec<u8> or bytes::BytesMut.

This is actually an important example of why we don't just use rusqlite, because while it has wrappers specifically for this purpose, there isn't actually any way to build this from arbitrary bytes besides a single unsafe method. I cannot imagine how you're reasonably meant to import an externally serialized database with this API. The crate is full of design decisions like this that I just don't agree with.

The actual serialize/deserialize API is pretty straightforward. SqliteConnection gains the following methods:

impl SqliteConnection {

    pub async fn deserialize(
        &mut self, 
        schema: &str,
        data: Vec<u8>, // or our bespoke buffer type
        read_only: bool // sets `SQLITE_DESERIALIZE_READONLY`
    ) -> sqlx::Result<()> { 
        // ...
    }

    pub async fn serialize(
        &mut self,
        schema: &str,
    ) -> sqlx::Result<Vec<u8>> { // or buffer type
        // ...
    }
}

It actually makes sense for these to exist as async fns on SqliteConnection itself because they both potentially involve non-trivial work that could otherwise block the runtime thread, so it's better to move these onto the worker thread. serialize() especially because it might have to read from the database file on-disk.

Implementing a wrapper around SQLite-allocated buffers actually makes a lot of sense here, because sqlite3_serialize() will return such a buffer, so if we wanted to return a Vec<u8> or the like, we'd actually have to copy the whole serialized database.

I was thinking deserialize() almost makes more sense as a constructor for SqliteConnection, but it actually attaches the deserialized database as the named schema, so you could have a connection with an open database for the main schema, and then attach the deserialized database as a secondary schema. For example, if you wanted a read-only database to be set as a secondary schema for all connections in a SqlitePool, you could add a call to .deserialize() in an after_connect callback.

(Fun fact, it literally executes an ATTACH statement to attach the memory database. x must be a special expression in this case. Also, looking at the code, you can actually pass NULL for the schema name instead of "main" for the primary schema, but this is not documented.)

Additionally, we can add the following to LockedSqliteHandle:

impl LockedSqliteHandle<'_> {
    // Passes `SQLITE_SERIALIZE_NOCOPY`
    pub fn serialize_nocopy<'a(&'a self, schema: &str) -> sqlx::Result<Option<&'a [u8]>> { /* ... */ }
}

Because SQLITE_SERIALIZE_NOCOPY actually causes sqlite3_serialize() to return the pointer it's currently using for storage (previously passed to sqlite3_deserialize()), it is only guaranteed to remain valid as long as the database handle is locked out. It's invalidated the next time the database is used (the docs specify a "write operation" but we don't distinguish those in the API). So it needs to be tied to the lifetime of the guard, not the connection itself.

abonander avatar Feb 12 '25 02:02 abonander

I would also be interested in a possible solution. Currently I'm trying to deserialize a read-only database into memory like this:

async fn sqlite_deserialize(db_bytes: Vec<u8>) -> Result<Pool<Sqlite>> {
    // Create in-memory SQLite database
    let pool = SqlitePoolOptions::new()
        .connect("sqlite::memory:")
        .await?;
    let mut conn = pool.acquire().await?;

    // Deserialize the uploaded data into the in-memory database
    let buf = SqliteOwnedBuf::try_from(db_bytes.as_slice())?;
    conn.deserialize(None, buf, true).await?;

    Ok(conn)
}

This seems to work, but when I try to use the resulting pool I get error error returned from database: (code: 1) no such table: my_table.

Do you have any idea how to make it work?

zebrapurring avatar Apr 28 '25 18:04 zebrapurring

There are some examples in tests: https://github.com/launchbadge/sqlx/pull/3745/files#diff-2a37c1ef7bfc67ec2bd41e4ff642f888870edecb94407a583e617fb162d1d3e1R1202

mattrighetti avatar May 08 '25 23:05 mattrighetti