Escape HTML content
It is possible to corrupt the generated HTML by including things that look like HTML tags within the name and desc. This is legal in the systemRDL syntax but corrupts the generated HTML.
Consider the following case:
addrmap reg_name_stress {
reg {name="The register controls the <h1> signal";
field { name="This field has some <td> controls"; fieldwidth=1; } field_a;
field { name="This field has some normal controls"; fieldwidth=1; } field_b; } reg_h;
};
This will render as follows:
and:
I think the correct behaviour would be to escape the problematic content as it passes through the peakrdl-html. For example
The register controls the <h1> signal
should be converted to
The register controls the &t;h1> signal
This may be better solved in systemrdl-compiler get_html_desc and get_html_name functions rather than being an issue with the peakrdl-html. The markdown syntax defined by systemRDL does not use any of the characters normally escaped in an HTML string therefore I think it might be safe to simply convert as follows:
| Character | Safe Form (escaped) | Comments |
|---|---|---|
| & | & |
|
| < | < |
|
| > | > |
|
| \" | " |
Note that SystemRDL has provision to include a " with an \" which requires slightly different treatment compared to normal |
| ' | ' |
@krcb197 I don't fully agree with some of the replacements you are proposing. Partly because the SystemRDL tools actually do two different text transformation steps in the get_html_desc/name() functions:
- Convert any "RDLFormatCode" tags to HTML tags
- These are seldom used, but technically part of the SystemRDL standard.
- Agree that the RDLFC
[quote]tag should generate escaped HTML". That seems like a sensible change.
- Optionally process text as Markdown
- This is a optional add-on to allow far richer documentation beyond the extremely limited (and ill-conceived) RDLFormatCode.
The reason I bring this up is because in Markdown, any HTML tags found in the original text is passed through to the output HTML unchanged. This is a feature of the Markdown language, and is useful if you need to inject custom HTML to do something beyond the capabilities of Markdown. If we were to auto-escape HTML characters, then this would break a pretty useful feature of Markdown. This is something I have seen used by some designers, so I am not willing to break this mechanism.
In cases where you truly need a < or > character, Markdown allows you to escape it with a backslash.
I would only support transformation into the escaped safe form in situations where Markdown processing is disabled, but I expect that is a pretty rare situation.
@amykyta3
Thank you for the considered feedback. I had not adequately considered the case where Markdown is used as a replacement for the RDLFormatCode, I agree the proposed change does not work with that. Most of my experience of using SystemRDL has stuck to the formatting defined by the specification, mainly because the other tooling only supported it.
I will update the PR on the compiler as you suggest limiting the scope to the [quote] as you suggest.