microformats-ruby icon indicating copy to clipboard operation
microformats-ruby copied to clipboard

Return only fragment of page

Open dissolve opened this issue 8 years ago • 4 comments

Had the idea of parsing a page and pulling out only a specific comment and how would that work (assuming it isn't posted from somewhere else). The idea would be to give a URL that has a fragment and the result items would contain anything from that id and below.

Would have to look at how this would work exactly, would likely need the whole page for rels and base and such.

dissolve avatar May 13 '17 18:05 dissolve

I chose to do this as part of my Microformats consuming code, XRay, rather than at the parser level. XRay first parses the HTML document to extract the node at the matching fragment, then it passes that HTML to the parser.

aaronpk avatar May 24 '17 14:05 aaronpk

Doesn't this break things like tags? or do you just include the header?

dissolve avatar May 24 '17 14:05 dissolve

Not sure what you mean "things like tags". Here's what it does: https://github.com/aaronpk/XRay/blob/master/lib/XRay/Formats/HTML.php#L82

Basically if a fragment is included, it runs $doc->saveHTML on that element and replaces the HTML that it fetched with the HTML from inside the HTML tag with that ID.

aaronpk avatar May 24 '17 15:05 aaronpk

lol... well then, github processes this as html.... things like <base> tags

dissolve avatar May 24 '17 15:05 dissolve