Semantic overhaul of Scribble with RDFa
This is a proposal for an overhaul of current Scribble implementation.
The goal of the project is for every document generated by Scribble to be semantically annotated with RDFa + JSON-LD and search to be (also) implemented with HDT/SPARQL as engine.
The benefits:
- Flexible Datalog like querying for every facet of document
- Possibility of building Natural Language Querying on top of RDF data and SPARQL- Ease of extending properties thanks to Open World Assumption of OWL.
- Data annotation can be as granular as needed (e.g. it could include source code line on github for every paragraph).
- Semantic annotation is accessible for search engines crawlers.
- If packagsrc would implement semantic data the data integration between these 2 would be seamless (e.g. you could search for all packages that use SomeFunction)
- Ability to reason about documentation (e.g. list documentation that doesn't fulfill certain criteria)
The means/Required work:
- implement RDF toolset in Racket:
- RDF turtle/NQuads
- JSON-LD (implemented by @cwebber)
- RDFa parser
- HDT ffi (for data storage and querying)
- Building OWL ontology for Scribble documentation/code
- Rework of Scribble output to replace current class based annotation with RDFa. If you are not familiar with Semantic annotation take a look at https://schema.org/Article especially scroll to the bottom to the RDFa tab.
- Data querying based on data stored in HDT files for local documentation and possibly triple store (e.g. RDF4J) for the server side documentation. http://www.rdfhdt.org/what-is-hdt/
If you have any questions regarding benefits/implementation details feel free to ask here.
I'm a little confused, are you proposing this as something you'd like to do? Or the current maintainers of Scribble? The latter seems unlikely to happen since this is a very large project.
I am writing it down so it won't be lost in a case I won't be able to come back to this myself and to open this for discussion and critical review.
Referenced issue and pull request were just the trigger as with semantified scribble these issues could be easily solved (and more).
I would love to do this myself as this is my area of expertise. But as mentioned on Racket Users at the moment I can't afford to spend time on free work, at least until I start having some income from my business in the making.
And yes I agree this is a large project, but it is also possible it could interest someone as a research project in the Uni Departments dealing with semantic data/ontologies/knowledge graphs.
In the meantime the least I can do is discuss the proposal if there is any interest at all.