Make everything installable via PyPI
- [ ] rewrite relation-graph in rust and use PyO3 bindings
- [ ] use PyO3 to wrap rdftab
As an alternative to wrapping rdftab is to directly load the statements table in Python. This will be slower, but it should be very straightforward if we skip loading of the stanza field, which we don't use. It will also have the advantage that we don't need to do transformations to RDF/XML using riot or robot.
See https://github.com/cmungall/relation-graph-py
Note it may not necessary to wrap rdftab using PyO3, we can use any rdf library (we don't use the stanza field from rdftab)
Consider instead: https://github.com/balhoff/whelk-rs
@hrshdhgd @cmungall I was trying to get semsql to work today in order to troubleshoot some issues I'm having with trying to use SqlImplementation in OAK.
I had a lot of problems with version 0.1.7 of semsql, so I installed the latest version, 0.2.0, but now I'm getting this error: /bin/sh: relation-graph: command not found
For now, should I continue using semsql==0.1.* (resolves to 0.1.7)?
Error message
/bin/sh: relation-graph: command not found
Related
I think this is intentional on OAK's end because of the above error, but I just wanted to let you know that this came up as well:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
oaklib 0.1.34 requires semsql<0.2.0,>=0.1.6, but you have semsql 0.2.0 which is incompatible.
Hi @joeflack4 - always use the latest version. If you are having issues with RG file an issue here: https://github.com/balhoff/relation-graph/issues.
Did you get your issue resolved?
Using PyO3 for RDFTab is certainly possible, but I wasn't planning to do it because I'll be using LDTab going forward. We've used PyO3 for valve.py and wiring.py and are working on using it LDTab (using horned-owl). We're happy to share our experience.
For this purpose, I think you're probably better off just porting RDFTab to Python.
@jamesaoverton - that makes sense.
the speed of rdflib is the main issue. even though we get very fast access once we have built the sqlite db, there are still cases where latency in the build is an issue. but certainly having this as an option seems reasonable.
I'm figuring medium term python bindings to horned-owl will solve a lot of use cases...
Please do be advised that you will encounter the following complex issues:
- Different available instruction sets (e.g. AVX256)
- Different architectures (Mac M1, M2, Intel...)
Do take these things into account while designing your build and deploy process. It took quite a while for us to figure out how to do this for Ensmallen.
Just linking the Slack thread that Chris opened: https://obo-communitygroup.slack.com/archives/C03D93DEALA/p1661527315827469
I agree with @LucaCappelletti94: Getting PyO3 to work has been the easy part, and cross-compiling binaries for packaging has been much tricker. With a lot of effort we have a workflow to compile for major architectures and push to PyPI using GitHub Actions. This has been tested but is not yet on production: https://github.com/ontodev/valve.py/blob/valve_rs_python_bindings/.github/workflows/build-and-publish-wheels.yml
Suggestions for improvements are welcome.
I have an experimental replacement for rdftab.rs:
https://github.com/INCATools/rdf-sql-bulkloader
this doesn't do any rust binding itself, it relies on https://github.com/ozekik/lightrdf for that part. If this is fruitful, we may want to coordinate with the devs of this to make sure they have best practice for releasing wheels etc
I am still doing perf tests (https://github.com/INCATools/rdf-sql-bulkloader/issues/1)
UPDATE the bulkloader now uses pyoxigraph which seems better supported
I added a general discussion for rust depenencies in OAK here:
https://github.com/INCATools/ontology-access-kit/discussions/247