sdow icon indicating copy to clipboard operation
sdow copied to clipboard

Contextually show where links can be found in the Wikipedia pages themselves

Open DyeffersonAz opened this issue 7 years ago • 9 comments

To show where the links were found, just because sometimes I can't find where this link is in the page.

DyeffersonAz avatar Oct 10 '18 00:10 DyeffersonAz

Thanks for the suggestion! I agree it would be a cool feature, but given the data source I'm using, it is not really easy to do. I don't ever actually see the full text of the Wikipedia page itself, just the Wikipedia database containing all the links. So I can't easily show you the context around where the link shows up in the actual page. Also, since the database is only updated monthly, it is possible the link is actually no longer on the page itself as it may have been edited since the latest database dump. Maybe I'll figure out a way to do this in the future, but for now, this is not feasible with my current architecture.

jwngr avatar Oct 10 '18 09:10 jwngr

You can't pick the HTML of the page, can you?

DyeffersonAz avatar Oct 10 '18 14:10 DyeffersonAz

I definitely could try something like that and I honestly think that is the way this would need to be implemented. But it wouldn't be very efficient and the system currently doesn't ever look at the raw HTML.

jwngr avatar Oct 10 '18 16:10 jwngr

Also, it would be better than needing to dump the database much times, it'd be automatic

DyeffersonAz avatar Oct 10 '18 18:10 DyeffersonAz

There is no way to do the actual search algorithm using live pages as it would take way too long. Thousands to tens of thousands of pages need to be touched. What I was referring to was just pull the context for a single page when you, for example, click on it in the graph view.

jwngr avatar Oct 12 '18 00:10 jwngr

Yep

DyeffersonAz avatar Oct 14 '18 14:10 DyeffersonAz

Maybe you could look through the HTML after the search has completed. Then do some web scraping to look for the link on the page and return the title of the section or subsection it was found in.

Quifisto avatar Nov 06 '19 16:11 Quifisto

Maybe you could look through the HTML after the search has completed. Then do some web scraping to look for the link on the page and return the title of the section or subsection it was found in.

This is what I was suggesting. A way SDOW could go to the live wikipedia page and search for each link, then return the parent header of that <a> element, for example. I don't have knowledge in web-development to help with this yet, unfortunately.

DyeffersonAz avatar May 04 '20 20:05 DyeffersonAz

that would be very nice since I cannot find any links shown in the results on either of the pages requested

xavzz avatar Mar 10 '24 03:03 xavzz