isso icon indicating copy to clipboard operation
isso copied to clipboard

utils/html: Provide full control of allowed HTML elements via the configuration file

Open pkvach opened this issue 1 year ago • 8 comments

Checklist

  • [x] All new and existing tests are passing
  • [x] (If docs changes needed:) I have updated the documentation accordingly.
  • [x] I have added an entry to CHANGES.rst because this is a user-facing change or an important bugfix
  • [x] I have written proper commit message(s)

What changes does this Pull Request introduce?

These changes provide full control over the management of allowed-elements and allowed-attributes through the configuration file.

  • Removed the hard-coded list of allowed HTML elements and attributes in the sanitizer, making it fully configurable through the configuration file.
  • Updated the default list of allowed elements and attributes in isso-dev.cfg, server-config.rst, and isso.cfg.
  • Adjusted tests in test_html.py to reflect the new configurable approach for allowed elements and attributes.

Why is this necessary?

Fixes https://github.com/isso-comments/isso/issues/751

pkvach avatar Apr 16 '24 05:04 pkvach

Thank you for tackling this. I will write my more detailed thoughts on this down later.

For now:

  1. Keep in mind that this is a massive footgun for people who used this feature as intended and now all their styling is gone.
  2. The whole markdown rendering configuration is extremely un-intuitive and error prone. We essentially only pass through tunables to misaka and bleach instead of providing the admin with sane options.
  3. People expecting GitHub-flavored Markdown (GHFM) to just work are massively let down.

Thanks for the thoughtful feedback! I appreciate you taking the time to consider the impact on existing users.

Backwards compatibility is a valid concern. If you have any ideas on how we might mitigate disruption or implement the changes more seamlessly, I'd be happy to hear them.

pkvach avatar Apr 25 '24 07:04 pkvach

Backwards compatibility is a valid concern. If you have any ideas on how we might mitigate disruption or implement the changes more seamlessly, I'd be happy to hear them.

My biggest concern with this particular PR would be users who have copied a sample config file a few years ago that contained allowed-elements = and now nothing renders at all.

I have a few ideas, none of them perfect:

  1. Use a different parameter name (e.g. allowed-html-elements) for the "new" behavior and use it only if the new config key is non-empty
  2. Compare user-supplied allowed-elements against the default list and warn (in the the server logs) if the user-supplied list is missing default items, or if it is significantly shorter (e.g. it only contains 3 items, then the user most likely wanted the old behavior)

The config parser silently merging the default config with a user-supplied one also doesn't make things easier here.


As for general thoughts on our markdown rendering: Do we really want to make incremental improvements to this pile of hacks or would energy better be spent overhauling the thing and making it radically more user-friendly?

I'm happy to review and merge PRs for incremental fixes, but it feels more like treading water than tackling the real issue. Note also that both misaka as well as bleach (see here, thanks to https://github.com/isso-comments/isso/issues/936#issuecomment-2083351957) are deprecated and might not survive the jump to Python 3.13 or whatever comes next.

ix5 avatar Apr 29 '24 21:04 ix5

Thank you for your thoughts on this.

  1. Use a different parameter name (e.g. allowed-html-elements) for the "new" behavior and use it only if the new config key is non-empty

I've been thinking about that option as well, and I'm leaning more towards it. I will try to move in that direction.

pkvach avatar May 03 '24 05:05 pkvach

Made changes to PR with alternative option: added new configuration option strictly-allowed-html-elements to specify only allowed HTML tags in generated output.

pkvach avatar May 06 '24 13:05 pkvach

Hey @pkvach, your solution is well-written code-wise, but I still would like to receive some (more) input from actual users regarding their experiences and expectations with markdown/HTML allowlist handling.

How do other commenting systems (Commento, Remark42, ...) handle element/attribute whitelisting, if at all?

ix5 avatar May 22 '24 18:05 ix5

Thanks for the feedback, @ix5 . I agree that we can take our time with these changes, as there is no rush here. As for other commenting systems, I can do some research to see how they handle whitelisting.

pkvach avatar May 24 '24 07:05 pkvach

It seems that both "Remark42" and "Commento" systems do not provide the ability to specify a list of allowed elements in the configuration. Same for "Disqus".

Both use a processor for Markdown: https://github.com/russross/blackfriday

https://github.com/adtac/commento/blob/a4c5cc8b9e5749dd50ef01f94c08dd7944a6d9c1/api/markdown_html.go#L8

https://github.com/umputun/remark42/blob/f7ba43e5f159d7431c2f3a7000f21d374be541b3/backend/app/store/formatter.go#L128

Remark42 also uses a sanitizer: https://github.com/microcosm-cc/bluemonday

https://github.com/umputun/remark42/blob/f7ba43e5f159d7431c2f3a7000f21d374be541b3/backend/app/store/comment.go#L119

pkvach avatar May 28 '24 21:05 pkvach

Thank you for your extensive research! At the moment, I'm too much occupied with IRL things to delve deeper into this and make a decision. Maybe one of @jelmer, @BBaoVanC, @stefangehn @fluffy-critter would care to comment and try to form some kind of consensus.

ix5 avatar Jun 30 '24 16:06 ix5