fix: parse `select` correctly when it contains HTML element
Closes #1069. Select was the problem not options, but that issue was prior to me looking at the WHATWG documentation and the Astro parser.
Changes
- This fixes parsing of
selectelements when they contain HTML elements. - The issue is specifically with how End Tags were being handle when
inSelectmode- Only
option,optgroup,select, andtemplateend tags were being handled. - Others were unhandled, this PR just changes so that if the end tag is not any of the ones from the previous bullet point, the parser should treat the tag as if it were
inBody. - Start tag handles it this way already.
- Only
- Previously the compiler could correctly produce a
selectcontaining a single HTML element, which I'll refer to as X, or aselectcontaining anoptioncontaining a single X. - However if a HTML element, Y, is added as the next sibling to either the
select,option, or X, Y will cannibalized by X becoming its child. Each subsequent element(child or sibling) will have whatever element was preceding it as its parent.
ie:
<select id="select-span-sibling">
<span>Lemon</span>
<option>Lime</option>
</select>
<div><span>Orange</span></div>
<div>Melon</div>
produces (formatted for readability):
<select id="select-span-sibling">
<span>
Lemon
<option>
Lime
</option>
<div>
<span>
Orange
<div>
Melon
</div>
</span>
</div>
</span>
</select>
- This change fixes it so that the previous example now produces the expected output:
<select id="select-span-sibling">
<span>Lemon</span>
<option>Lime</option>
</select>
<div><span>Orange</span></div>
<div>Melon</div>
- I briefly tested the compiled WASM in a local Astro project and the generated HTML from the
.astrowas matching the expected/"correct" output. - I don't know if there are additional changes needed to be made to handle special cases or better match the spec or if this somehow broke how people were previously using it, but I believe this PR produces a more correct output than before.
Testing
I wasn't sure where best to do testing so I just added to test_printer where I could most immediately observe outputs. Additionally I wasn't sure what was the appropriate amount of tests I should add or how verbose to be with the naming, so I added only a few to cover some limited scenarios and it's far from exhaustive. I can add, remove, or change them if it is desired.
Docs
I did not change any documentation as I believe it's only a bugfix.
🦋 Changeset detected
Latest commit: f7af8186f432a7b6dcb21fd76eb66f48a289640a
The changes in this PR will be included in the next version bump.
This PR includes changesets to release 1 package
| Name | Type |
|---|---|
| @astrojs/compiler | Patch |
Not sure what this means? Click here to learn what changesets are.
Click here if you're a maintainer who wants to add another changeset to this PR
@lbrooney are you sure about the validity of the bug? The w3c validator returns an error when span is a child of select
@lbrooney are you sure about the validity of the bug? The w3c validator returns an error when
spanis a child ofselect
So I believe you are correct in that a span is not a valid child of a select(if my understanding is correct the current spec says anything not explicity defined is an error).
To my knowledge, currently a span in an option is also not spec compliant either, but considering as of Chrome 135, with the introduction of customizable selects and the ability for them to contain rich HTML content that is changing. There is an active spec pull request here: whatwg/html#9799. Additional details can also be found in the previously linked Chrome dev blog as well.
Admittedly my example in the initial PR is bad as I'm not sure that even within the new spec changes whether that would be valid, but as of right now Chrome doesn't raise any concerns about it, it exists within the DOM, but it not visible/presented(maybe with some CSS you can make it visible). Apologies, however, I do think it demonstrated how broken the output produced could be.
I probably should have included another example in my opening issue, but for instance:
<select>
<option><span>Lemon</span></option>
<option><span>Lime</span></option>
</select>
currently produces(formatted):
<select>
<option>
<span>Lemon
<option>
<span>Lime</span>
</option>
</span>
</option>
</select>
Taking advantage of this new feature is currently impossible within an Astro component.
Blink currently has this behind a runtime flag (the function above it does the actual check). Checking the references you can see how they've changed parsing, with html_tree_builder.cc I believe being most relevent in this case.
Thanks for contributing!
Looking at the updated spec and this https://chromium-review.googlesource.com/c/chromium/src/+/5118452/10/third_party/blink/renderer/core/html/parser/html_tree_builder.cc the change looks a bit more involved than parsing with the inBody insertion mode.
Wondering if we should get rid of HTML error correction. Browsers are already optimized to do that, and it would avoid having to keep up with the HTML spec.
I posted a proposal to discuss that!
https://github.com/withastro/roadmap/discussions/1166
Thanks for contributing!
For why my solution is rather terse is that while, to my understanding, the new select spec is largely finalized, I didn't want to engineer a spec-for-spec implementation only for it to change and have it now be incorrect (long way to say I was lazy 😅). The current changes introduced by the PR solved my problems so I was satisfied with it. To your second point, if the produced HTML with this solution was not spec compliant, the browser would handle it, which while not ideal is well within their capabilities.
I can push some additional snapshot tests, but with the current PR if a select tag (properly closed/opened or not) is a child of an an element within a select, it is does not exist in the produced output.
<!-- any of these inputs -->
<select><option><span><select>Lemon</select></span></option></select>
<select><option><span><select></select>Lemon</span></option></select>
<select><option><span><select>Lemon</span></option></select>
<select><option><span></select>Lemon</span></option></select>
<!-- same output by astro -->
<select><option><span>Lemon</span></option></select>`
The behaviour on handled cases within inSelect.
<!-- select tag in option any of these inputs -->
<select><option><select></select><span>Lemon</span></option></select>
<select><option><select><span>Lemon</span></option></select>
<select><option></select><span>Lemon</span></option></select>
<!-- same output by astro -->
<select><option></option></select><span>Lemon</span>
I posted a proposal to discuss that!
In regards to your proposal, I was having similar thoughts when I initally came across this bug and found that I didn't encounter it with React TSX. I believe that's essentially how the current Astro TSX parser workers by just ignoring parse errors, and I may be incorrect in this, I believe React TSX also doesn't "correct" improper HTML.
I believe React TSX also doesn't "correct" improper HTML.
How / why should it? All the DOM knowledge is in react-dom and JSX is constructed via simple rules - which do not follow HTML(5) rules. The tree is constructed just by using methods such as createElement and appendChild, not via the HTML(5) parser (which would take a string to be parsed as a fragment, i.e., be much slower and not synced with the VDOM).
In short React / JSX does not deal at all with HTML - just with DOM nodes. The only time innerHTML is used is via the special dangerouslySetInnerHTML prop on "native" elements (i.e., handled at runtime via react-dom).