semantic-sql icon indicating copy to clipboard operation
semantic-sql copied to clipboard

Make everything installable via PyPI

Open cmungall opened this issue 3 years ago • 11 comments

  • [ ] rewrite relation-graph in rust and use PyO3 bindings
  • [ ] use PyO3 to wrap rdftab

As an alternative to wrapping rdftab is to directly load the statements table in Python. This will be slower, but it should be very straightforward if we skip loading of the stanza field, which we don't use. It will also have the advantage that we don't need to do transformations to RDF/XML using riot or robot.

cmungall avatar Jun 13 '22 21:06 cmungall

See https://github.com/cmungall/relation-graph-py

Note it may not necessary to wrap rdftab using PyO3, we can use any rdf library (we don't use the stanza field from rdftab)

cmungall avatar Jul 06 '22 22:07 cmungall

Consider instead: https://github.com/balhoff/whelk-rs

cmungall avatar Aug 10 '22 00:08 cmungall

@hrshdhgd @cmungall I was trying to get semsql to work today in order to troubleshoot some issues I'm having with trying to use SqlImplementation in OAK.

I had a lot of problems with version 0.1.7 of semsql, so I installed the latest version, 0.2.0, but now I'm getting this error: /bin/sh: relation-graph: command not found

For now, should I continue using semsql==0.1.* (resolves to 0.1.7)?

Error message

/bin/sh: relation-graph: command not found

Related

I think this is intentional on OAK's end because of the above error, but I just wanted to let you know that this came up as well:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
oaklib 0.1.34 requires semsql<0.2.0,>=0.1.6, but you have semsql 0.2.0 which is incompatible.

joeflack4 avatar Aug 10 '22 15:08 joeflack4

Hi @joeflack4 - always use the latest version. If you are having issues with RG file an issue here: https://github.com/balhoff/relation-graph/issues.

Did you get your issue resolved?

cmungall avatar Aug 26 '22 15:08 cmungall

Using PyO3 for RDFTab is certainly possible, but I wasn't planning to do it because I'll be using LDTab going forward. We've used PyO3 for valve.py and wiring.py and are working on using it LDTab (using horned-owl). We're happy to share our experience.

For this purpose, I think you're probably better off just porting RDFTab to Python.

jamesaoverton avatar Aug 26 '22 15:08 jamesaoverton

@jamesaoverton - that makes sense.

the speed of rdflib is the main issue. even though we get very fast access once we have built the sqlite db, there are still cases where latency in the build is an issue. but certainly having this as an option seems reasonable.

I'm figuring medium term python bindings to horned-owl will solve a lot of use cases...

cmungall avatar Aug 26 '22 20:08 cmungall

Please do be advised that you will encounter the following complex issues:

  • Different available instruction sets (e.g. AVX256)
  • Different architectures (Mac M1, M2, Intel...)

Do take these things into account while designing your build and deploy process. It took quite a while for us to figure out how to do this for Ensmallen.

LucaCappelletti94 avatar Aug 26 '22 21:08 LucaCappelletti94

Just linking the Slack thread that Chris opened: https://obo-communitygroup.slack.com/archives/C03D93DEALA/p1661527315827469

joeflack4 avatar Aug 26 '22 21:08 joeflack4

I agree with @LucaCappelletti94: Getting PyO3 to work has been the easy part, and cross-compiling binaries for packaging has been much tricker. With a lot of effort we have a workflow to compile for major architectures and push to PyPI using GitHub Actions. This has been tested but is not yet on production: https://github.com/ontodev/valve.py/blob/valve_rs_python_bindings/.github/workflows/build-and-publish-wheels.yml

Suggestions for improvements are welcome.

jamesaoverton avatar Aug 29 '22 13:08 jamesaoverton

I have an experimental replacement for rdftab.rs:

https://github.com/INCATools/rdf-sql-bulkloader

this doesn't do any rust binding itself, it relies on https://github.com/ozekik/lightrdf for that part. If this is fruitful, we may want to coordinate with the devs of this to make sure they have best practice for releasing wheels etc

I am still doing perf tests (https://github.com/INCATools/rdf-sql-bulkloader/issues/1)

UPDATE the bulkloader now uses pyoxigraph which seems better supported

cmungall avatar Aug 30 '22 00:08 cmungall

I added a general discussion for rust depenencies in OAK here:

https://github.com/INCATools/ontology-access-kit/discussions/247

cmungall avatar Aug 30 '22 15:08 cmungall