mwparserfromhell icon indicating copy to clipboard operation
mwparserfromhell copied to clipboard

improper parsing of nested invalid templates (was: get_sections works for file but not for string)

Open kjschiroo opened this issue 10 years ago • 2 comments

This is really strange! This script works for me:

from mwparserfromhell import parse

doc = parse(open('654444027.txt'), skip_style_tags=True)

for sec in doc.get_sections(include_lead=True, flat=True):
    if len(sec.filter_headings()) > 1:
        print("Bad!")
        # never prints

however, this does not work:

from mwparserfromhell import parse

with open('654444027.txt') as f:
    text = f.read()

doc = parse(text, skip_style_tags=True)

for sec in doc.get_sections(include_lead=True, flat=True):
    if len(sec.filter_headings()) > 1:
        print("Bad!")
        # prints once

Here is the input file: 654444027.txt

kjschiroo avatar Jan 11 '16 19:01 kjschiroo

The inconsistency is fixed in the above commit, but you've pointed out another thing that needs a more involved fix, which I'll have to work on later.

earwig avatar Jan 11 '16 23:01 earwig

Btw, I've hand-fixed the test-page for now. use prev revision to test :)

yuvipanda avatar Jan 14 '16 18:01 yuvipanda