Allow Elements to be passed to parse_*()
Addresses #10
@Mimino666 There's no documentation here yet (I wasn't going to add it until you're happy with what I've done)
Handling parse() was an unexpected quirk: if we only have an Element then it doesn't look like we can know whether a document was parsed as HTML or XML so we don't know whether to use an XML or a HTML extractor.
We can guess based on the presence (or not) of a namespace on the Element, but you can still parse XML snippets without a namespace so that could still lead to unexpected results. It also has the side effect of casting the Element back to a string as part of the XML header snooping which is what we were trying to avoid in the first place (although a check for this could be added).
I've opted to force the caller to be explicit: if you want to pass an Element to parse() then you must use parse_html() or parse_xml() instead.
Calling code would now look like:
def test_element_as_parser(self):
"""
we can pass an Element as the extractor to parse_*()
"""
html = '''
<div><span>Hello world!</span></div>
<div></div>
<div><span>Hello mars!</span></div>
'''
# take only the first containers so we can verify that the correct descendant is chosen
container = Element(css='div', count=3).parse(html)[2]
val = Element(css='span', count=1).parse_html(container)
self.assertEqual(val.tag, 'span')
self.assertEqual(val.text, 'Hello mars!')
The important line is val = Element(css='span', count=1).parse_html(container). Instead of re-parsing the tree the container Element passed to parse_html() is simply wrapped up in a new HtmlXPathExtractor.