atomic-data-docs icon indicating copy to clipboard operation
atomic-data-docs copied to clipboard

Document considerations: String edit commit property, iteratively update strings, resolve conflicts

Open joepio opened this issue 4 years ago • 5 comments

It is quite common for individuals to work on long strings, for example when working on documents. In its current form, commits use the set property which overwrites individual properties. That brings two issues:

  • When working on a larger document, this will lead to large commits that fully overwrite the document. That's a bit of a bandwidth waste.
  • It prevents two users from editing the same document at the same time, as commits are likely to conflict. (CRDT's might be an option here)

In order to deal with these issues, I decided to split up documents in many sections (elements), which helps to keep commits smaller and allows for concurrent document editing (as long as individuals are working in separate elements). However, this, too, introduces problems:

  • The document version is not updated when elements change, which means that if a user looks at the versions of the document, it will not see what the user will probably expect
  • Elements can become orphans if they are not used. Destroying these is an option, but we should also check if it's used anywhere else...
  • Elements can be re-used anywhere, which can be useful, but can also lead to unexpected situation where edits in one document also unintentionally appear in another document.

For these reasons, and some more, I'd like to use Markdown. It's very cross-compatible and easy to understand. But doing this would require solving the first issues: fix conflicting commits, or make them non-conflicting to begin with. Let's discuss some approaches:

Line based commits

  • The client calculates a diff between the current and the new state, and communicates at which lines it wants to insert / remove something.
  • Check out the git datamodel for inspriation

joepio avatar Nov 16 '21 11:11 joepio

References - potential solution: https://github.com/rust-crdt/rust-crdt

AlexMikhalev avatar Dec 02 '21 15:12 AlexMikhalev

Just had a meeting with @YousefED, who created BlockNote, which uses YJS + ProseMirror (Y-Prosemirror).

  • Yjs seems pretty easy to use - you don't have to worry about manually creating events. You just have some state and ask for an update.

joepio avatar Dec 04 '23 12:12 joepio

CRDTs are certainly exciting - and Rust implementations of YJS is emerging, which looks promising (but I have only glanced - not sure how reliable it is for production yet)

You might also want to consider m-ld - a CRDT using RDF triples as its building block.

jonassmedegaard avatar Dec 04 '23 15:12 jonassmedegaard

Let me try elaborate what it really is that I propose, as I realize that it might not at all be obvious from my terse post above.

You consider using Markdown as document format, and then break it into possible edit actions for use as exchange format. You then point towards projects exploring the practice of codifying edit operations on (perhaps just any plaintext or maybe specifically) Markdown data.

What I propose is to treat the pieces of a Markdown document as the core document format, same way as it is done my the mighty Pandoc tool - in fact directly reusing that format in Rust or some or other JavaScript environments - and then define each of those Pandoc AST components as RDF data types, and use directly as both internal and as exchange format, thanks to m-ld.

jonassmedegaard avatar Dec 04 '23 16:12 jonassmedegaard

Thanks for the thoughts @jonassmedegaard!

I do think we'll probably need a specialized document model, which might use Yjs's model under the hood. As of now, a Document consists of Elements, which in turn have MD strings. This has some issues, especially when making real-time / CRDT style changes.

More thoughts

joepio avatar Dec 04 '23 18:12 joepio