JavaScript is not ignored
The code inside the following script tag:
<script type="text/javascript" language="javascript">var id2 = '0.3106187978178174';</script>
Was read to me as normal text by Squirt.
This particular issue could be fixed by using innerText as opposed to textContent. I'm not sure what the browser support on this is though. (See: Node.textContent, JSFiddle).
I think, ideally, the tokenization process should be overhauled -- neither innerText nor textContent works exactly as is needed. Something along the lines of filtering the DOM of non-visible tags, and flattening them out into a string is necessary. (see #20 #17 #3)
The same with Articles on http://www.spiegel.de/.
The following content is shown to the reader and inside a <script />tag:
<!--
if (navigator.userAgent.indexOf('iPhone') == -1) {
document.writeln('<div class="spMInline">');
document.writeln('<scr'+'ipt type="text\/javascr</scr');
document.writeln("OAS_RICH('Middle2');");
document.writeln('<\/scr'+'ipt>');
document.writeln('<\/div>');
}
// -->
@clarkf another way could be to "read ahead" and skip such content?