Simpletalk Proposal: Custom Text Editor for Field

What

This proposal outlines a custom text editor component that can be use as/in our Field Part.

Why

We have wrestled endlessly with the DOM's existing contenteditable capability, and continue to battle against its shortcomings. These shortcomings have stymied our ability to predictably and consistently deal with the structure of text and styling in our Field part.

Shortcomings include:

Inability to have full control over the cursor;
Differences across browsers in the handling of newline insertion (ie, what happens when you push the return key);
Inability to "smartly" move the next insertion outside of some parent element. Because text looks flat but the HTML representation is a tree, this presents many problems when attempting to style/group text ranges

Component Proposal

I propose that we create a new, vanilla webcomponent that deals with basic text editing in terms that we define and control. It will have several design rules, outlined below, that will allow us to have predictable, testable behavior, and that will have a structure that makes it more amenable to our Field behaviors.

Mirrored text and visual content

The new text editor will be composed of two simultaneously interacting sets of data: the source text (a string) and the visual display of that text (the html markup). The component will handle updating both of these at the same time during change operations, which include insertions and deletions.

At any time in the lifecycle of a program, the stored text will be said to match the contents of the HTML's text nodes as if they were extracted and appended together (with the exception of newlines, see below).

Additionally, styling on ranges of the text will be stored as special objects that provide the start and end points of such stylings within the text. These, too, will be linked too and mirrored on the visual (HTML) side.

Keeping Selections in Sync

The question of how to keep selections in sync is crucial. For example, how can we create a DOM Range object that correctly maps to the same substring of the source text (and visa versa)?

This can be solved with relative ease using the createTreeWalker document method, like so:

    _getTextNodes(){
        let nodes = [];
        let walker = document.createTreeWalker(
            this,
            NodeFilter.SHOW_ALL,
            null
        );
        let currentNode = walker.nextNode();
        while(currentNode){
            nodes.push(currentNode);
            currentNode = walker.nextNode();
        }
        let lineAdjustedNodes = [];
        nodes.forEach(node => {
            if(node.nodeType == Node.TEXT_NODE){
                lineAdjustedNodes.push({
                    str: node.nodeValue,
                    node: node
                });
            } else if(node.nodeName == this.lineElement.toUpperCase() && node.classList.contains('st-text-line')){
                if(lineAdjustedNodes.length){
                    let prev = lineAdjustedNodes[lineAdjustedNodes.length - 1];
                    prev.str += "\n";
                }
            }
        });
        return lineAdjustedNodes;
    }

Let's break down what is happening. The point of this function is to create a list of dictionaries, each of which has two values: a string version of the text at the given DOM text node, and a reference to the actual node that it comes from. We walk the tree of all DOM nodes (this includes both elements and text nodes), attempting to isolate text nodes for the most part. The only exception here is if we come across the special .st-text-line span (our central organizing idea, see below). In this case, we insert an additional newline into the stored string representation of the previously created dictionary.

If we have ensured that the presence of this line element corresponds with a newline (see below), then this will always yield the correct raw text value, and should be equivalent to the stored raw text value. What's important is the coupled information about the nodes -- this allows us to create Ranges with the correct start node, start offset, end node, and end offset.

Lines: the primary organizing property

Lines will be the main organizing principle of the contents of the text editor. A line has two related definitions in this component: a text definition and a visual definition:

text line - Any set of valid, non-newline display characters that terminates in a newline character (or a newline character that is directly preceded by another newline character)
visual line - A  element whose classList contains st-text-line

For the visual representation, there are some additional rules. First, DOM Text Nodes will always be enclosed by one of these line  elements, either as a direct parent or ancestor (styling or syntax highlighting spans can serve as intervening ancestors). This means that a Text Node will never be the direct child of the component element itself, and all text can be found by come collecting of line  elements.

Second, the insertion of a newline into the text will create a new text line  element and update the cursor location (see below about the cursor). Likewise, the removal of a line of text from the text source will result in the whole corresponding visual text  being removed (and visa versa).

Examples

Let's say we have the following source text:

This is the first line\nThis is the second line

It will be displayed as:

<span class="st-text-line">This is the first line</span>
<span class="st-text-line">This is the second line</span>

However, if there are some styling ranges associated with some parts of the text -- for example, bolding the second and third word of each line respectively, the visual representation could also look like this:

<span class="st-text-line">This <span class="st-styling style="font-weight:bold;">is</span> the first line</span>
<span class="st-text-line">This is <span class="st-styling style="font-weight:bold;">the</span> second line</span>

The Cursor

One central advantage of this custom approach is that we have complete control over the cursor.

For this component, the Cursor will be a <div> element with a special class. We will style it to look just like a normal cursor (we can even animate it to "blink" if we want).

The Cursor div, like all DOM text nodes, will always live inside of a  ancestor. Because we can query the location of the cursor and easily get information about its ancestors/siblings, we can more easily perform intelligent insertions and deletions.

For example, here is a rough prototype of what might happen during the handling of a backspace:

    visualDeleteBackward(){
        // If the cursor is the first child node
        // in its parent, then we move the cursor to
        // the end of the parent's previous sibling
        // and erase the parent.
        if(this.cursor == this.cursor.parentElement.childNodes[0]){
            console.log('Cursor is first child node');
            let targetElement = this.cursor.parentElement;
            if(targetElement){
                targetElement.previousElementSibling.append(this.cursor);
                targetElement.remove();

                // If the removed element was a text line
                // element, it's the equivalent to deleting a newline
                // in the underlying textarea. So we do not recursively
                // call the delete backward action.
                if(!targetElement.classList.contains('st-text-line')){
                    return this.visualDeleteBackward();
                }
                return true;
            }
        }
        let previousSibling = this.cursor.previousSibling;
        if(previousSibling.nodeType == Node.ELEMENT_NODE){
            console.log('previousSibling is element');
            previousSibling.append(this.cursor);
            return this.visualDeleteBackward();
        }
        if(previousSibling.nodeType == Node.TEXT_NODE){
            console.log('previousSibling is text node');
            if(previousSibling.nodeValue == ""){
                console.log('previous sib text node is empty');
                previousSibling.remove();
                return this.visualDeleteBackward();
            } else {
                let val = previousSibling.nodeValue;
                previousSibling.nodeValue = val.slice(0, val.length - 1);
            }
        }
        return false;
    }

Here's the gist of this code. When a backspace happens, there are several possibilities:

The cursor div is directly preceded by a text node. If the node is an empty string, delete the text node and then recursively call the function (since nothing "visual" has been deleted yet); otherwise, delete the last character of that text node.
The cursor div is directly preceded by a sibling element. Append the cursor div to the end of that sibling element, then recursively call the function;
The cursor div has no preceding node at all, and therefore is the first child node in its direct parent. If the parent has no other child nodes inside of it at all, this means it is empty and should be deleted entirely from the DOM. Append the cursor to the end of its parent's parent, then remove the original parent completely. Otherwise (if there are DOM node children in the parent), append the cursor to the end of the parent's parent. In both cases we recursively call the function

There is a little more to this when it comes to removing  elements, however, since they also require removing a newline char from the underlying source text representation. But the complete control is simple enough to handle and this should give you an idea of the advantages of a completely controlled cursor.

Styling Ranges

We want to be able to style specific substrings of the source text easily. This can be done using some structured collection (a dictionary, list, or similar) of custom styling range objects, which all have the following properties at minimum:

Start index of the substring in the full text representation;
End index of the substring in the full text representation;
Something about what styling to apply (perhaps a class or the contents of the inline style= attribute

When used in combination with something like the tree walking function described earlier, we have enough information to reliably add  wrappers around the DOM nodes corresponding to the substring.

There is one complication, however. What if a given styling range crosses several dom nodes that are not contained in the same hierarchical subtree? For example, imagine a styling range that encloses all of one line and only half of the next line. What do we do in this case?

Because lines are our primary organizing element, when applying the style range we can simply break the resulting style  elements between them. In the example case, the result would be two  elements: one as the sole child element of the first line (thus enclosing all nodes within the line) and one that encloses only the sub-substring nodes of the second line. Both are easily accomplished using the DOM Range API, since we have easy access to correct node and offset information via the treewalker function.

Syntax Highlighting

Syntax highlighting is extremely error-prone and difficult, but it is especially so under the plain contenteditable system. This is because wrapping bits of parsed text in  elements corresponding to some grammatical aspect is not like styling ranges! There is much more structure in wrapped bits of syntax.

Consider this example. In Simpletalk, the MessageHandler rule is a grouping of a MessageHandlerOpen, StatementLines, and a MessageHandlerClose. Each of those parts also has their own set of constituent parts that would need to be highlighted too, but ignore those for now. The important point here is this: how do I know if the lines I am actively typing are "inside" of a MessageHandler? In other words, if I am typing in the body of a handler and hit the return key, what prevents the text editor from creating a new  outside of the enclosed handler rather than in it? And, furthermore, how to I hit return to make a newline outside of the handler?

With contenteditable this was a real nightmare. However, under this proposal we have access to more useful information. Because we know the cursor position, the fact that line spans are the organizing principle, any syntax highlighter can so with the cursor what it needs to, as well as deal with any custom wrapping elements it inserts. For example, once a MessageHandlerOpen is parsed and highlighted, any subsequent  elements can be assumed to be inside of a potential enclosing handler. It's only when pushing return at the end of a parsed MessageHandlerClose that a newline will then be created outside of the enclosing handler.

If we take this route, I'm confident we will find some hiccups with syntax highlighting. But the complete control afforded to any highlighter object will be paramount to making this all work correctly. The new custom text editor is exactly what can give it such control.

Questions / Comments?

@dkrasner @ApproximateIdentity Let me know what you think about the outline of this proposal, and any questions / potential issues you have.

I have prototyped a vanilla version of some of what I am talking about and it seems to be working well so far.

Jun 11 '21 14:06 darth-cheney

@darth-cheney This all makes a lot of sense and I can't imagine any more wrangling with contenteditable&friends. There are just no good solutions there in my view, if all of the above is to be satisfied.

Some further considerations and questions (it's likely you already thought about much of this but I'll add it here for completeness):

Properties & State

We'll want to store both html and the text as properties. The text property can be dynamic and on change can update the html property, by generating the html as you describe above. This will require parsing for new lines, but potentially for more (see "On Chunking" below). When the html property is updated the view will handle that change and update the DOM.

When actively typing there are two options: a) keyboard input -> text change -> text property update -> html property update -> notification to view/DOM update b) keyboard input -> text change -> (text property update && view/DOM update ) -> html property update (without notification)

a is consistent with our model, but it will likely be jittery and add cursor position complexity b is more or less what we have now

When updating the text property in ST (say via the set command): text property update -> html property update -> notification to view/DOM update (putting the cursor in the first line, 0th position)

I would also consider adding "private" properties, which should never be exposed in ST itself, or at least not settable (note: this is different from readOnly, although atm we could probably get away with using readOnly). Candidates would include html and id. This can be part of a bigger program of property annotation, which would have been nice for the editor.

On Chunking

With the above it's likely we can get a lot of chunking almost for free via DOM queries.

Getting the nth sentence would be equivalent to parentElement.querySelector("span.st-text-line:nth-child(N)").

Similarly when parsing the text into html, or generating html from text, we could add classes for paragraphs (if the text is not a script and a new line starts with a tab for example, or whatever logic). Then similarly you could get the nth paragraph with a query like parentElement.querySelector("span.st-text-line.st-text-paragraph:nth-child(N)").

But in the same vein you could ask for the nth handler, if we have corresponding css class markup or whatever we decide there.

Sub-fields

Although this never worked quite right, I think it's worth making it happen. Instead of doing selections and ranges like in the current version, perhaps the easiest would be to simply directly update the text of the parent field, which would then update the visual text/html. For example imagine you have the following field:

This is the first line\nThis is the second line

<span class="st-text-line">This is the first line</span>
<span class="st-text-line">This is the second line</span>

and you open a sub-field on

first line\nThis is the

which visually looks like

<span class="st-text-line">first line</span>
<span class="st-text-line">This is the</span>

If you start typing, somewhere in the sub-field

first XXXline\nThis is the

this would update the (parent) field text with

This is the first XXXline\nThis is the second line

and the visual text/html. Exactly as if you are typing in both at the same time.

We would just need to have a mechanism that updates (slice's in) the corresponding parent field selection.

Question

I am a little confused about what you mean by

The important point here is this: how do I know if the lines I am actively typing are "inside" of a MessageHandler? In other > words, if I am typing in the body of a handler and hit the return key, what prevents the text editor from creating a new outside of the enclosed handler rather than in it? And, furthermore, how to I hit return to make a newline outside of the handler?

Why would we want to prevent a new line here? The principle actor in ST is a line and the grammar decides what kind of line it is (handler open, handler close, statement) so this seems 1-1 with the html above.

For example,

on click
     answer "OK"
end click

would correspond to

<span class="st-text-line">on click</span>
<span class="st-text-line">\tanswer "OK"</span>
<span class="st-text-line">end click</span>

so when I hit return after on click I would expect a new line there. At that very moment we could check whether the (finished) line matches a MessageHandlerOpen and add a corresponding class to the css etc

Jun 11 '21 16:06 dkrasner

@dkrasner

Some clarifications and answers to your questions inline.

First, I'm thinking the <st-text-editor> should be a vanilla webcomponent that handles all the basic functionality we care about, independently of ST. It's sole (for now) consumer in our system, FieldView, will wrap that component.

Additionally, I think it's best to have the HTML content of such an editor element available in the live DOM (eg, slotted) rather than exclusively in the shadow DOM. This is so anyone can style it as needed.

When actively typing there are two options: a) keyboard input -> text change -> text property update -> html property update -> notification to view/DOM update b) keyboard input -> text change -> (text property update && view/DOM update ) -> html property update (without notification)

a is consistent with our model, but it will likely be jittery and add cursor position complexity b is more or less what we have now

When updating the text property in ST (say via the set command): text property update -> html property update -> notification to view/DOM update (putting` the cursor in the first line, 0th position)

Setting aside the issue of the basic text editor component not specifically needing to have this (but only interfaces making it possible), I would propose some modification to the above assumptions. I think we will rarely want to update the DOM/HTML visuals based on changes to the stored text representation specifically. The only times I think we really need this is on deserialization (if there is no HTML available) and pasting in text.

All interaction from the user should take place directly on the DOM elements involved, after which the underlying stored text will be updated. We can have special-case functions for going in the reverse, and those hopefully won't be too much trouble.

We'll want to store both html and the text as properties.

Yes, but on Field/FieldView and not necessarily the text editor component.

I have been imagining that both of these will be dynamic props. The reason is the current context of the editor and what "setting" or "getting" should do. For example, if the editor has a selection, something like set "text" to "blahblah" should replace the only the contents of the selection with the passed in string. If there is no selection, the Field will replace all the text with the incoming string. If we don't want to have this behavioral distinction, we can split it into two properties "text" and "selection-text".

One outstanding question at this point is this: do we want to have the html property be read-only? My take is that any changes to the html from Simpletalk should only occur through a) Direct user input; b) Special built-in commands. This is because we do not want to be resetting the complete HTML tree each time one character updates. That will result in stuttered visuals, especially if we are also using things like syntax highlighting.

Why would we want to prevent a new line here? The principle actor in ST is a line and the grammar decides what kind of line it is (handler open, handler close, statement) so this seems 1-1 with the html above.

This has more to do with the structure of our grammar and how ohm semantics work.

First, our grammar doesn't actually describe everything in terms of lines, and therefore we can't just "parse a single line" and see what it is. For example, MessageHandlerOpen is defined as ending in a newline (therefore it only parses once return is pushed in the editor), but MessageHandlerClose does not include the newline. If I have entered only an opener, and hit return, I have a parsed opener but not a fully parsed MessageHandler. Only when I type the first character after end in the handler close will the full MessageHandler be parsed. The behavior of hitting return after the handler open vs after the handler close is different, and the syntax highlighter needs to easily be able to account for this. It was almost impossible in contenteditable, but I think this new setup will make cases like that easy enough to deal with.

I can step through this using your click handler example (note: I'm abbreviating  as just <line>):

Step 1: I have typed the handler open but not yet hit return:

<line>on click<div class="cursor"></div></line>

Step 2: I now push return:

<line><span class="syntax" data-rule="MessageHandlerOpen">on click</span></line>
<line><div class="cursor></div></line>

Step 3: I type "end" plus the first character of message name, which will match on MessageHandlerClose and highlight the syntax, but I have not yet pushed return. This not only highlights the handler close, but realizes there's a whole MessageHandler, and wraps that accordingly.

<span class="syntax" data-rule="MessageHandler>
    <line><span class="syntax" data-rule="MessageHandlerOpen">on click</span></line>
    <line><span class="syntax" data-rule="MessageHandlerClose>end c<div class="cursor></div></line>
</span>

Here is where we run into the problem: if I push return, where does the cursor go and in which element to we create the newline? By default, return jumps the cursor out of the current line, appends a new line element to the parent, and then appends itself into that new line. If this happened as the next step in this example, the new line would not be in the root, but inside of the  that describes a MessageHandler, which is incorrect.

<!-- What will happen, but is not correct -->
<span class="syntax" data-rule="MessageHandler>
    <line><span class="syntax" data-rule="MessageHandlerOpen">on click</span></line>
    <line><span class="syntax" data-rule="MessageHandlerClose>end c<div class="cursor></div></line>
    <line><div class="cursor"></div></line>
</span>
  
<!-- What we need to have happen -->
<span class="syntax" data-rule="MessageHandler>
    <line><span class="syntax" data-rule="MessageHandlerOpen">on click</span></line>
    <line><span class="syntax" data-rule="MessageHandlerClose>end c<div class="cursor></div></line>
</span>
<line><div class="cursor"></div></line>

Therefore the syntax highlighter's handling of the MessageHandlerClose highlighting needs to be smart enough to create the new line element two ancestors up and the move the cursor there. Allowing the various highlighting rules to be aware of this structure was damn near impossible with contenteditable because of inconsistent DOM insertions. But now I think it's just a matter of allowing the syntax highlighter to be aware of other syntax elements and to move the cursor accordingly.

Jun 11 '21 19:06 darth-cheney

can't we just define MessageHandlerClose in the grammar to terminate with a newline? so once you hit return you get a fully parsed message handler?

Jun 11 '21 20:06 dkrasner

also i am not quite sure why we can't simply have a 'flat' line structure for code

<line><span class="syntax" data-rule="MessageHandlerOpen">on click</span></line>
<line><span class="syntax" data-rule="CommentLine">--...</span></line>
<line><span class="syntax" data-rule="StatementLine">...</span></line>
<line><span class="syntax" data-rule="StatementLine">...</span></line>
<line><span class="syntax" data-rule="MessageHandlerClose>end click></line>
 <line><div class="cursor"></div></line>

and within the statement lines we can have certain keywords highlighted, like if/then/else/end if/while/for etc

What do we loose in this case? (I suppose code folding is one example, but what else?)

Jun 11 '21 20:06 dkrasner

First, I'm thinking the <st-text-editor> should be a vanilla webcomponent that handles all the basic functionality we care about, independently of ST. It's sole (for now) consumer in our system, FieldView, will wrap that component.

yeah that sounds right, and i missed it above. Can we then just get away with a text prop in ST? or you think the underlying parsing would be too heavy on de-serialization if we have a lot of these components with a bit of text?

the reason i missed it i think is that it's not clear to me how this html generation will work. If it happens strictly in the 'vanilla' webcomponent that doesn't know anything about ST, then how does it highlight syntax. Or is syntax highlighting something that st-field does on the html generated by the webcomp.

Jun 11 '21 20:06 dkrasner