mwparserfromhell icon indicating copy to clipboard operation
mwparserfromhell copied to clipboard

Heading node should strip leading and trailing whitespace

Open roysmith opened this issue 5 years ago • 0 comments

Possibly related to #55?

import mwparserfromhell

wikicode = mwparserfromhell.parse('== Foo ==')
for heading in wikicode.filter_headings():
    print(repr(heading.title))

prints:

' Foo '

i.e. it preserves the whitespace inside the == marks. I don't know if that's strictly incorrect, but it's different from what Parsoid does, which is to strip the whitespace:

<!DOCTYPE html>
<html prefix="dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/" about="https://en.wikipedia.org/wiki/Special:Redirect/revision/1005253253"><head prefix="mwr: https://en.wikipedia.org/wiki/Special:Redirect/"><meta property="mw:TimeUuid" content="8e95a010-68b2-11eb-b0ff-a35f8f9272f7"/><meta charset="utf-8"/><meta property="mw:pageId" content="2943777"/><meta property="mw:pageNamespace" content="2"/><link rel="dc:replaces" resource="mwr:revision/1005252527"/><meta property="mw:revisionSHA1" content="9fa2ea02674418d1bab8d09bd0c639bcf220a57b"/><meta property="dc:modified" content="2021-02-06T19:36:03.000Z"/><meta property="mw:html:version" content="2.2.0"/><link rel="dc:isVersionOf" href="//en.wikipedia.org/wiki/User%3ARoySmith/sandbox"/><title>User:RoySmith/sandbox</title><base href="//en.wikipedia.org/wiki/"/><link rel="stylesheet" href="/w/load.php?lang=en&amp;modules=mediawiki.skinning.content.parsoid%7Cmediawiki.skinning.interface%7Csite.styles&amp;only=styles&amp;skin=vector"/><meta http-equiv="content-language" content="en"/><meta http-equiv="vary" content="Accept"/></head><body id="mwAA" lang="en" class="mw-content-ltr sitedir-ltr ltr mw-body-content parsoid-body mediawiki mw-parser-output" dir="ltr"><section data-mw-section-id="0" id="mwAQ"></section><section data-mw-section-id="1" id="mwAg"><h2 id="Foo">Foo</h2></section></body></html>

roysmith avatar Feb 06 '21 19:02 roysmith