node icon indicating copy to clipboard operation
node copied to clipboard

sqlite: add support for SQLite Session Extension

Open louwers opened this issue 1 year ago • 26 comments

We were talking about adding support for the Session Extension of SQLite in #53752. This is not a run-time extension but rather a feature built in to the SQLite amalgamation. It can be enabled with a compile flag and is enabled by default on some platforms.

I have never contributed to Node.js before so I will probably need some help to bring this to a conclusion.


TODO:

  • [x] Make sure sessions are deleted before the database they are attached to is closed
  • [ ] Consensus on API
  • [x] Documentation
  • [x] Allow custom conflict handler
  • [x] ~Throw with specific (documented) exception in case of conflict when applying changeset~ since SQLite doesn't consider this to be an error I return false when applying the changeset is aborted due to a conflict
  • [x] Allow generating a patchset as well
  • [x] Allow specifying table name when creating session (to only track that table)
  • [x] Implement Session.close()

Example usage:

const database1 = new DatabaseSync(':memory:');
const database2 = new DatabaseSync(':memory:');

database1.exec('CREATE TABLE data(key INTEGER PRIMARY KEY, value TEXT)');
database2.exec('CREATE TABLE data(key INTEGER PRIMARY KEY, value TEXT)');

const session = database1.createSession();

const insert = database1.prepare('INSERT INTO data (key, value) VALUES (?, ?)');
insert.run(1, 'hello');
insert.run(2, 'world');

const changeset = session.changeset();
database2.applyChangeset(changeset);
// Now database2 contains the same data as database1 

louwers avatar Aug 02 '24 17:08 louwers

Review requested:

  • [ ] @nodejs/gyp
  • [ ] @nodejs/security-wg

nodejs-github-bot avatar Aug 02 '24 17:08 nodejs-github-bot

Wouldn't this be solved if users could use their own SQLite build? That's already being considered according to the original issue.

karimfromjordan avatar Aug 02 '24 18:08 karimfromjordan

@karimfromjordan Nope, because you need access to the C API to access this feature.

louwers avatar Aug 02 '24 18:08 louwers

Can you add the appropriate documentation as well?

anonrig avatar Aug 02 '24 18:08 anonrig

Codecov Report

Attention: Patch coverage is 86.06965% with 28 lines in your changes missing coverage. Please review.

Project coverage is 88.41%. Comparing base (7b01758) to head (092a28c). Report is 125 commits behind head on main.

Files with missing lines Patch % Lines
src/node_sqlite.cc 86.93% 6 Missing and 20 partials :warning:
src/node_sqlite.h 0.00% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #54181      +/-   ##
==========================================
+ Coverage   88.40%   88.41%   +0.01%     
==========================================
  Files         654      654              
  Lines      187594   188034     +440     
  Branches    36089    36169      +80     
==========================================
+ Hits       165834   166259     +425     
+ Misses      15001    14999       -2     
- Partials     6759     6776      +17     
Files with missing lines Coverage Δ
src/node_sqlite.h 70.00% <0.00%> (-7.78%) :arrow_down:
src/node_sqlite.cc 83.80% <86.93%> (+0.47%) :arrow_up:

... and 62 files with indirect coverage changes

---- 🚨 Try these New Features:

codecov[bot] avatar Aug 02 '24 19:08 codecov[bot]

I need to make sure sessions are deleted before the database they are attached to is closed.

I think I need the DatabaseSync hold onto the Sessions it creates. Both are BaseObjects so I need to look into how to do that.

When a Session goes out of scope the corresponding DatabaseSync needs to no longer hold onto it. Conversely I also need to make sure that all operations on Session objects are no-ops after the corresponding database is closed.

louwers avatar Aug 02 '24 23:08 louwers

@anonrig I have added documentation, most of the functionality and fixed the lint issues. Could you trigger another CI run?

I want to add support for generating a patchset and then this PR is ready for review.

I have some ideas for future improvements, e.g. more fine-grained conflict handling or exposing the utilities SQLite provides for inspecting and merging changesets, but they can be added in a backward-compatible manner.

louwers avatar Aug 03 '24 19:08 louwers

This is excellent work that many projects will surely be based on. I am curious about the eventual wrapping of the entire Sessions C API (mixed & mashed into higher-level Node functions). Does this PR leave space for that? There are a few more Sessions API functions that would be of great use in Node.

TheOneTheOnlyJJ avatar Aug 04 '24 16:08 TheOneTheOnlyJJ

@TheOneTheOnlyJJ Thanks! I agree that exposing more functionality of the Session Extension in Node.js would be great. Here are some ideas:

  • Wrappers for sqlite3changeset_invert() (creates a changeset that undoes the changes in another changeset) and sqlite3changeset_concat() (combines changesets).
  • Allow closing a session. Right now a session is closed when it is garbage collected or when the corresponding database is closed. It is helpful to be able to manually close session in some cases.
  • Allowing the contents of a changeset to be read. You probably want to reuturn a Generator for this so you don't need to read the entire changeset it in memory at once.
  • Change the onConflict handler to take a function as well. You would be able to inspect the conflicting change and decide on the action (omit, replace, abort) for each conflicting change.
  • A wrapper for sqlite3session_diff().
  • Utilize streaming versions of the Session API (you would need another API for Node.js as well). These are helpful for reading or applying changesets that do not fit in memory.

Here is the full API for future reference: https://www.sqlite.org/session/funclist.html

Right now I first want to gather some feedback, but I'd be interested continue working on this, either in a follow-up PR or in this one. Since the SQLite integration is still under active development I think working incrementally is the preferred approach.

louwers avatar Aug 04 '24 17:08 louwers

Good point, here's my view on them:

Yes, I was about to suggest these. There's also sqlite3session_isempty() and sqlite3session_memory_used() which are simpler (compared to the rest of the API) and could be wrapped more easily.

  • Allow closing a session. Right now a session is closed when it is garbage collected or when the corresponding database is closed. It is helpful to be able to manually close session in some cases.

This should be a high priority right now, as long-running Node processes (like server back-ends) that would use Sessions will have their memory slowly but surely filled up with no way of freeing it without closing the database connection. Applications could run for months or even longer without closing a database connection, so not being able to close Sessions and free their memory would deter developers from using them. The database should, of course, still keep a list of all active sessions and delete them when it is closed, just in case not all Sessions were manually closed. From my point of view, this function should be wrapped as soon as possible and be included in this PR. Not having it surely decreases the chances of getting this merged, and it would be a shame to lose out on this.

  • Allowing the contents of a changeset to be read. You probably want to reuturn a Generator for this so you don't need to read the entire changeset it in memory at once.
  • Change the onConflict handler to take a function as well. You would be able to inspect the conflicting change and decide on the action (omit, replace, abort) for each conflicting change.
  • A wrapper for sqlite3session_diff().

Very important functionalities, but again, they're not vital for achieving functional basic interaction with the extension. These 3 points could be bundled in a follow-up PR that handles the finer-grained data management side of the API. The changegroup could be exposed as an object with its API functions bound to it as methods.

  • Utilize streaming versions of the Session API (you would need another API for Node.js as well). These are helpful for reading or applying changesets that do not fit in memory.

I've noticed this in the documentation and it's definitely a part of the API that would be crucial in large apps with large databases. But again, for the current goal of incrementally proposing a stable API that exposes the API of the extension, this is not required right now. It would also require wrapping sqlite3session_config(), which allows configuring SQLITE_SESSION_CONFIG_STRMSIZE. This could be handled after the 3 points above get sorted out.

Also, before I wrap up...

I noticed that you proposed the createSession() method as a wrapper around both sqlite3session_create() and sqlite3session_attach(), combined. Looking at the API, I'm trying to figure out if there's any potential functionality or use case where exposing these functions separately in Node would make sense. The only potential use case I see is creating Sessions in one part of the application and then attaching them somewhere else. Maybe this would be useful if using a global Session that gets created at app start and attached later in the app life cycle? But, given that a Session cannot be unattached, this is likely not going to be useful in any way, so the separation of the functions is not needed. What's you input on this, do you see any potential use cases for the splitting of these 2 functions?

TheOneTheOnlyJJ avatar Aug 05 '24 09:08 TheOneTheOnlyJJ

This should be a high priority right now

Agreed. I will still add this one as part of this PR.

What's you input on this, do you see any potential use cases for the splitting of these 2 functions?

I don't see any benefits. Furthermore, I like the philosophy of better-sqlite3:

better-sqlite3's public API must be as simple as possible. Rather than calling 3 functions in a specific order, it's simpler for users to call a single function. Rather than providing many similar functions for doing similar things (e.g., "convenience functions"), there should just be one function that is already convenient by design. Sane defaults should be applied when possible. A function's minimal call signature should be as small as possible, with progressively complex customization available when needed. Function names should only be as long as necessary to convey their purpose. For any new feature, it should be easy to showcase code examples that is are so simple that they are self-explanatory.

In this vein I also oped to default to aborting on conflict, instead of requiring an explicit conflict handler to be passed like in the API of SQLite.

louwers avatar Aug 05 '24 10:08 louwers

Agreed. I will still add this one as part of this PR.

Yes, this would be ideal. For further updates other than this, I believe it's best to wait for the final form of the node:sqlite API itself.

I don't see any benefits. Furthermore, I like the philosophy of better-sqlite3:

I was checking if any additional functionality could be squeezed out by separating the functions. Given that there is none, your decision is a sensible one and I support it.

In this vein I also oped to default to aborting on conflict, instead of requiring an explicit conflict handler to be passed like in the API of SQLite.

This is also the best sane default option, as every developer should customize this behavior according to their use case.

I'm not experienced in the internals of Node, so my contribution here can only amount to discussing the API design and smaller implementation details.

I guess the next step now (after implementing the Session close wrapper) is to wait for a maintainer to review this?

TheOneTheOnlyJJ avatar Aug 05 '24 10:08 TheOneTheOnlyJJ

FWIW sqlite is now a valid subsystem

avivkeller avatar Aug 05 '24 19:08 avivkeller

The Session close wrapper should be added to the TODO list at the top of the Issue.

TheOneTheOnlyJJ avatar Aug 09 '24 07:08 TheOneTheOnlyJJ

@TheOneTheOnlyJJ Added TODO and the code. 👍

louwers avatar Aug 09 '24 10:08 louwers

I merged main back in and made use of the new DatabaseSync::IsOpen method.

Could @anonrig or @cjihrig perhaps trigger a CI run?

louwers avatar Aug 23 '24 00:08 louwers

CI: https://ci.nodejs.org/job/node-test-pull-request/61380/

nodejs-github-bot avatar Aug 23 '24 13:08 nodejs-github-bot

@cjihrig There was a merge conflict on CI for some reason. I merged main back in again, could you trigger another run? 🙂

louwers avatar Aug 24 '24 15:08 louwers

CI: https://ci.nodejs.org/job/node-test-pull-request/61418/

nodejs-github-bot avatar Aug 24 '24 15:08 nodejs-github-bot

Not sure why CI is reporting a merge conflict:

17:44:52 HEAD is now at d5316e8426 Merge remote-tracking branch 'origin/main' into session-extension
17:44:52 + git status
17:44:52 HEAD detached at d5316e8426
17:44:52 Untracked files:
17:44:52   (use "git add <file>..." to include in what will be committed)
17:44:52 	env.properties
17:44:52 
17:44:52 nothing added to commit but untracked files present (use "git add" to track)
17:44:52 + git rev-parse HEAD
17:44:52 d5316e8426b89634a76a9926cfa887488b9d4523
17:44:52 + git rev-parse origin/main
17:44:52 e272cfbf05adc8d8687aad9ede5660ab99b360c6
17:44:52 + [ -n origin/main ]
17:44:52 + git rev-parse origin/main
17:44:52 + REBASE_ONTO=e272cfbf05adc8d8687aad9ede5660ab99b360c6
17:44:52 + git rebase --committer-date-is-author-date e272cfbf05adc8d8687aad9ede5660ab99b360c6
17:44:53 Rebasing (1/33)
17:44:53 Auto-merging src/node_sqlite.cc
17:44:53 Auto-merging src/node_sqlite.h
17:44:53 CONFLICT (content): Merge conflict in src/node_sqlite.h
17:44:53 Auto-merging test/parallel/test-sqlite.js
17:44:53 error: could not apply bc951f733b... src, test: support for SQLite Session Extension

I pinged the build team.

Edit: turns out merge commits are not supported.

Edit 2: I managed to fix it with the help of the build team 🎉

louwers avatar Aug 24 '24 17:08 louwers

@cjihrig CI was passing, but I just had to rebase again.

Would love to get some feedback on this PR. 🙏

I can understand if you want to hold off with this until the core functionality has matured.

louwers avatar Aug 25 '24 12:08 louwers

Would love to get some feedback on this PR.

I'll give the code a look. I think we need a @nodejs/sqlite team that can be pinged on PRs like this. I don't want to become a blocker (I don't currently have a lot of free time for Node, and the time I do have leading up to v23 will be largely spent on the test runner). I think it would also be good for getting more opinions on what to include in the API.

cjihrig avatar Aug 25 '24 15:08 cjihrig

Pinging some people that reviewed SQLite PRs in the past. @fhinkel @jasnell @benjamingr @anonrig @H4ad @aduh95 @atlowChemi @RedYetiDev @tniessen @targos

Feedback on the API, this functionality or the code would be very welcome.

louwers avatar Aug 25 '24 16:08 louwers

  • Utilize streaming versions of the Session API (you would need another API for Node.js as well). These are helpful for reading or applying changesets that do not fit in memory.

Just for reference in the future, this part of the API could get heavy inspiration from #54213, once it gets finalised, approved & merged.

TheOneTheOnlyJJ avatar Sep 04 '24 06:09 TheOneTheOnlyJJ

Is there any update on this?

TheOneTheOnlyJJ avatar Sep 23 '24 07:09 TheOneTheOnlyJJ

@TheOneTheOnlyJJ Working on the PR comments. I intend to get this merged!

louwers avatar Sep 23 '24 10:09 louwers

I am sorry to intrude into this conversation, but I have been working on creating a standardized interface for SQL based databases for JS (runtime agnostic) for a little while. Since I see that the API for node:sqlite has not yet reached consensus, I was hoping to join the discussion and see if an API that works for multiple database drivers can be collaborated on. If this would be of interest, I would be happy to join a conversation, or you can join the conversation over at the RFC. Let me know if there is anything I can clarify. Looking forward to see this being natively supported!

RFC: https://github.com/halvardssm/stdext/pull/6

halvardssm avatar Oct 09 '24 17:10 halvardssm

Thanks for your reviews! Took me a while to get back to this.

I have addressed most comments and left a few questions. Please have another look if you have the chance.

If someone could kick off CI that would be great.

louwers avatar Nov 02 '24 23:11 louwers

CI: https://ci.nodejs.org/job/node-test-pull-request/63580/

nodejs-github-bot avatar Nov 16 '24 15:11 nodejs-github-bot

src/node_sqlite.cc:383:  Tab found; better to use spaces  [whitespace/tab] [1]
src/node_sqlite.cc:475:  Lines should be <= 80 characters long  [whitespace/line_length] [2]
Done processing src/node_sqlite.cc
Total errors found: 2

aduh95 avatar Nov 16 '24 15:11 aduh95