mwparserfromhell
mwparserfromhell copied to clipboard
improper parsing of nested invalid templates (was: get_sections works for file but not for string)
This is really strange! This script works for me:
from mwparserfromhell import parse
doc = parse(open('654444027.txt'), skip_style_tags=True)
for sec in doc.get_sections(include_lead=True, flat=True):
if len(sec.filter_headings()) > 1:
print("Bad!")
# never prints
however, this does not work:
from mwparserfromhell import parse
with open('654444027.txt') as f:
text = f.read()
doc = parse(text, skip_style_tags=True)
for sec in doc.get_sections(include_lead=True, flat=True):
if len(sec.filter_headings()) > 1:
print("Bad!")
# prints once
Here is the input file: 654444027.txt
The inconsistency is fixed in the above commit, but you've pointed out another thing that needs a more involved fix, which I'll have to work on later.
Btw, I've hand-fixed the test-page for now. use prev revision to test :)