sqlite: add support for SQLite Session Extension
We were talking about adding support for the Session Extension of SQLite in #53752. This is not a run-time extension but rather a feature built in to the SQLite amalgamation. It can be enabled with a compile flag and is enabled by default on some platforms.
I have never contributed to Node.js before so I will probably need some help to bring this to a conclusion.
TODO:
- [x] Make sure sessions are deleted before the database they are attached to is closed
- [ ] Consensus on API
- [x] Documentation
- [x] Allow custom conflict handler
- [x] ~Throw with specific (documented) exception in case of conflict when applying changeset~ since SQLite doesn't consider this to be an error I return
falsewhen applying the changeset is aborted due to a conflict - [x] Allow generating a patchset as well
- [x] Allow specifying table name when creating session (to only track that table)
- [x] Implement Session.close()
Example usage:
const database1 = new DatabaseSync(':memory:');
const database2 = new DatabaseSync(':memory:');
database1.exec('CREATE TABLE data(key INTEGER PRIMARY KEY, value TEXT)');
database2.exec('CREATE TABLE data(key INTEGER PRIMARY KEY, value TEXT)');
const session = database1.createSession();
const insert = database1.prepare('INSERT INTO data (key, value) VALUES (?, ?)');
insert.run(1, 'hello');
insert.run(2, 'world');
const changeset = session.changeset();
database2.applyChangeset(changeset);
// Now database2 contains the same data as database1
Review requested:
- [ ] @nodejs/gyp
- [ ] @nodejs/security-wg
Wouldn't this be solved if users could use their own SQLite build? That's already being considered according to the original issue.
@karimfromjordan Nope, because you need access to the C API to access this feature.
Can you add the appropriate documentation as well?
Codecov Report
Attention: Patch coverage is 86.06965% with 28 lines in your changes missing coverage. Please review.
Project coverage is 88.41%. Comparing base (
7b01758) to head (092a28c). Report is 125 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/node_sqlite.cc | 86.93% | 6 Missing and 20 partials :warning: |
| src/node_sqlite.h | 0.00% | 2 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #54181 +/- ##
==========================================
+ Coverage 88.40% 88.41% +0.01%
==========================================
Files 654 654
Lines 187594 188034 +440
Branches 36089 36169 +80
==========================================
+ Hits 165834 166259 +425
+ Misses 15001 14999 -2
- Partials 6759 6776 +17
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/node_sqlite.h | 70.00% <0.00%> (-7.78%) |
:arrow_down: |
| src/node_sqlite.cc | 83.80% <86.93%> (+0.47%) |
:arrow_up: |
- Flaky Tests Detection - Detect and resolve failed and flaky tests
- JS Bundle Analysis - Avoid shipping oversized bundles
I need to make sure sessions are deleted before the database they are attached to is closed.
I think I need the DatabaseSync hold onto the Sessions it creates. Both are BaseObjects so I need to look into how to do that.
When a Session goes out of scope the corresponding DatabaseSync needs to no longer hold onto it. Conversely I also need to make sure that all operations on Session objects are no-ops after the corresponding database is closed.
@anonrig I have added documentation, most of the functionality and fixed the lint issues. Could you trigger another CI run?
I want to add support for generating a patchset and then this PR is ready for review.
I have some ideas for future improvements, e.g. more fine-grained conflict handling or exposing the utilities SQLite provides for inspecting and merging changesets, but they can be added in a backward-compatible manner.
This is excellent work that many projects will surely be based on. I am curious about the eventual wrapping of the entire Sessions C API (mixed & mashed into higher-level Node functions). Does this PR leave space for that? There are a few more Sessions API functions that would be of great use in Node.
@TheOneTheOnlyJJ Thanks! I agree that exposing more functionality of the Session Extension in Node.js would be great. Here are some ideas:
- Wrappers for
sqlite3changeset_invert()(creates a changeset that undoes the changes in another changeset) andsqlite3changeset_concat()(combines changesets). - Allow closing a session. Right now a session is closed when it is garbage collected or when the corresponding database is closed. It is helpful to be able to manually close session in some cases.
- Allowing the contents of a changeset to be read. You probably want to reuturn a
Generatorfor this so you don't need to read the entire changeset it in memory at once. - Change the
onConflicthandler to take a function as well. You would be able to inspect the conflicting change and decide on the action (omit, replace, abort) for each conflicting change. - A wrapper for
sqlite3session_diff(). - Utilize streaming versions of the Session API (you would need another API for Node.js as well). These are helpful for reading or applying changesets that do not fit in memory.
Here is the full API for future reference: https://www.sqlite.org/session/funclist.html
Right now I first want to gather some feedback, but I'd be interested continue working on this, either in a follow-up PR or in this one. Since the SQLite integration is still under active development I think working incrementally is the preferred approach.
Good point, here's my view on them:
- Wrappers for
sqlite3changeset_invert()(creates a changeset that undoes the changes in another changeset) andsqlite3changeset_concat()(combines changesets).
Yes, I was about to suggest these. There's also sqlite3session_isempty() and sqlite3session_memory_used() which are simpler (compared to the rest of the API) and could be wrapped more easily.
- Allow closing a session. Right now a session is closed when it is garbage collected or when the corresponding database is closed. It is helpful to be able to manually close session in some cases.
This should be a high priority right now, as long-running Node processes (like server back-ends) that would use Sessions will have their memory slowly but surely filled up with no way of freeing it without closing the database connection. Applications could run for months or even longer without closing a database connection, so not being able to close Sessions and free their memory would deter developers from using them. The database should, of course, still keep a list of all active sessions and delete them when it is closed, just in case not all Sessions were manually closed. From my point of view, this function should be wrapped as soon as possible and be included in this PR. Not having it surely decreases the chances of getting this merged, and it would be a shame to lose out on this.
- Allowing the contents of a changeset to be read. You probably want to reuturn a
Generatorfor this so you don't need to read the entire changeset it in memory at once.- Change the
onConflicthandler to take a function as well. You would be able to inspect the conflicting change and decide on the action (omit, replace, abort) for each conflicting change.- A wrapper for
sqlite3session_diff().
Very important functionalities, but again, they're not vital for achieving functional basic interaction with the extension. These 3 points could be bundled in a follow-up PR that handles the finer-grained data management side of the API. The changegroup could be exposed as an object with its API functions bound to it as methods.
- Utilize streaming versions of the Session API (you would need another API for Node.js as well). These are helpful for reading or applying changesets that do not fit in memory.
I've noticed this in the documentation and it's definitely a part of the API that would be crucial in large apps with large databases. But again, for the current goal of incrementally proposing a stable API that exposes the API of the extension, this is not required right now. It would also require wrapping sqlite3session_config(), which allows configuring SQLITE_SESSION_CONFIG_STRMSIZE. This could be handled after the 3 points above get sorted out.
Also, before I wrap up...
I noticed that you proposed the createSession() method as a wrapper around both sqlite3session_create() and sqlite3session_attach(), combined. Looking at the API, I'm trying to figure out if there's any potential functionality or use case where exposing these functions separately in Node would make sense. The only potential use case I see is creating Sessions in one part of the application and then attaching them somewhere else. Maybe this would be useful if using a global Session that gets created at app start and attached later in the app life cycle? But, given that a Session cannot be unattached, this is likely not going to be useful in any way, so the separation of the functions is not needed. What's you input on this, do you see any potential use cases for the splitting of these 2 functions?
This should be a high priority right now
Agreed. I will still add this one as part of this PR.
What's you input on this, do you see any potential use cases for the splitting of these 2 functions?
I don't see any benefits. Furthermore, I like the philosophy of better-sqlite3:
better-sqlite3's public API must be as simple as possible. Rather than calling 3 functions in a specific order, it's simpler for users to call a single function. Rather than providing many similar functions for doing similar things (e.g., "convenience functions"), there should just be one function that is already convenient by design. Sane defaults should be applied when possible. A function's minimal call signature should be as small as possible, with progressively complex customization available when needed. Function names should only be as long as necessary to convey their purpose. For any new feature, it should be easy to showcase code examples that is are so simple that they are self-explanatory.
In this vein I also oped to default to aborting on conflict, instead of requiring an explicit conflict handler to be passed like in the API of SQLite.
Agreed. I will still add this one as part of this PR.
Yes, this would be ideal. For further updates other than this, I believe it's best to wait for the final form of the node:sqlite API itself.
I don't see any benefits. Furthermore, I like the philosophy of better-sqlite3:
I was checking if any additional functionality could be squeezed out by separating the functions. Given that there is none, your decision is a sensible one and I support it.
In this vein I also oped to default to aborting on conflict, instead of requiring an explicit conflict handler to be passed like in the API of SQLite.
This is also the best sane default option, as every developer should customize this behavior according to their use case.
I'm not experienced in the internals of Node, so my contribution here can only amount to discussing the API design and smaller implementation details.
I guess the next step now (after implementing the Session close wrapper) is to wait for a maintainer to review this?
FWIW sqlite is now a valid subsystem
The Session close wrapper should be added to the TODO list at the top of the Issue.
@TheOneTheOnlyJJ Added TODO and the code. 👍
I merged main back in and made use of the new DatabaseSync::IsOpen method.
Could @anonrig or @cjihrig perhaps trigger a CI run?
CI: https://ci.nodejs.org/job/node-test-pull-request/61380/
@cjihrig There was a merge conflict on CI for some reason. I merged main back in again, could you trigger another run? 🙂
CI: https://ci.nodejs.org/job/node-test-pull-request/61418/
Not sure why CI is reporting a merge conflict:
17:44:52 HEAD is now at d5316e8426 Merge remote-tracking branch 'origin/main' into session-extension
17:44:52 + git status
17:44:52 HEAD detached at d5316e8426
17:44:52 Untracked files:
17:44:52 (use "git add <file>..." to include in what will be committed)
17:44:52 env.properties
17:44:52
17:44:52 nothing added to commit but untracked files present (use "git add" to track)
17:44:52 + git rev-parse HEAD
17:44:52 d5316e8426b89634a76a9926cfa887488b9d4523
17:44:52 + git rev-parse origin/main
17:44:52 e272cfbf05adc8d8687aad9ede5660ab99b360c6
17:44:52 + [ -n origin/main ]
17:44:52 + git rev-parse origin/main
17:44:52 + REBASE_ONTO=e272cfbf05adc8d8687aad9ede5660ab99b360c6
17:44:52 + git rebase --committer-date-is-author-date e272cfbf05adc8d8687aad9ede5660ab99b360c6
17:44:53 Rebasing (1/33)
17:44:53 Auto-merging src/node_sqlite.cc
17:44:53 Auto-merging src/node_sqlite.h
17:44:53 CONFLICT (content): Merge conflict in src/node_sqlite.h
17:44:53 Auto-merging test/parallel/test-sqlite.js
17:44:53 error: could not apply bc951f733b... src, test: support for SQLite Session Extension
I pinged the build team.
Edit: turns out merge commits are not supported.
Edit 2: I managed to fix it with the help of the build team 🎉
@cjihrig CI was passing, but I just had to rebase again.
Would love to get some feedback on this PR. 🙏
I can understand if you want to hold off with this until the core functionality has matured.
Would love to get some feedback on this PR.
I'll give the code a look. I think we need a @nodejs/sqlite team that can be pinged on PRs like this. I don't want to become a blocker (I don't currently have a lot of free time for Node, and the time I do have leading up to v23 will be largely spent on the test runner). I think it would also be good for getting more opinions on what to include in the API.
Pinging some people that reviewed SQLite PRs in the past. @fhinkel @jasnell @benjamingr @anonrig @H4ad @aduh95 @atlowChemi @RedYetiDev @tniessen @targos
Feedback on the API, this functionality or the code would be very welcome.
- Utilize streaming versions of the Session API (you would need another API for Node.js as well). These are helpful for reading or applying changesets that do not fit in memory.
Just for reference in the future, this part of the API could get heavy inspiration from #54213, once it gets finalised, approved & merged.
Is there any update on this?
@TheOneTheOnlyJJ Working on the PR comments. I intend to get this merged!
I am sorry to intrude into this conversation, but I have been working on creating a standardized interface for SQL based databases for JS (runtime agnostic) for a little while. Since I see that the API for node:sqlite has not yet reached consensus, I was hoping to join the discussion and see if an API that works for multiple database drivers can be collaborated on. If this would be of interest, I would be happy to join a conversation, or you can join the conversation over at the RFC. Let me know if there is anything I can clarify. Looking forward to see this being natively supported!
RFC: https://github.com/halvardssm/stdext/pull/6
Thanks for your reviews! Took me a while to get back to this.
I have addressed most comments and left a few questions. Please have another look if you have the chance.
If someone could kick off CI that would be great.
CI: https://ci.nodejs.org/job/node-test-pull-request/63580/
src/node_sqlite.cc:383: Tab found; better to use spaces [whitespace/tab] [1]
src/node_sqlite.cc:475: Lines should be <= 80 characters long [whitespace/line_length] [2]
Done processing src/node_sqlite.cc
Total errors found: 2