Definition of an explicit installation sequence of dependencies?
Hi all,
this question might be more related to Gradle than to pygradle, so please redirect me if necessary. Anyhow, I think its worth discussing in the context of pygradle here.
Currently, I am trying to build a minimal working example in the setting of Machine Learning (say, a getting started project on the Iris dataset) using:
- Docker as a build container
- pygradle to build my python project
- pivy-importer to have a locally cached pypi repository (I consider moving that to Artifactory in the future, but since we don't own a Pro instance here, I am stuck to the open source version)
It is my aim to make a PR for an other example project once everything runs smoothly. Currently, I am facing one single last problem...
Besides others, I am defining python project dependencies for:
- scipy (0.18.1) and
- scikit-learn (0.18).
scikit-learn depends on scipy, but does not mention this dependency in any metadata explicitly. As a result, If one intends to install scikit-learn (even in a fresh environment), the installation procedure obviously fails due to the missing required scipy library (see the attached log1.txt file for this scenario from within pygradle).
From the same log1.txt file, I deduce that dependencies are installed using alphabetical order ( I have tested some permutations in the build.gradle file without succedd). Since "scikit-learn" < "scipy" (on ASCII level), the build step "installPythonRequirements" will always fail when resolving dependencies in alphabetical order.
As a work around, one can
- remove the dependency for scikit-learn for the first build run,
- wait for the import error in the source code,
- re-add the dependency for scikit-learn, and
- run the build step again.
Since the same virtualenv as for the first run is used, scipy is correctly detected. This, in turn, results in a successful installation of scikit-learn s.t. the build finally succeeds.
Therefore my question (in this scenario so one can image multiple others): How to define or configure that scikit-learn is installed after scipy?
Thanks! André
Hi @busche, do you know if the scipy or scikit-learn maintainers are aware of this missing metadata (i.e., the dependency declaration)? If that was properly modeled, this wouldn't be an issue. I think @zvezdan might have a work around for you that we use with other internal dependencies that have this type of bad or missing metadata.
@sholsapp This is an issue because these scientific Python packages are not pure Python packages but rather Python/C/Fortran libraries that unfortunately depend on each other's C shared libraries to build. Standard Python packages don't have this issue and can be installed out-of-order from their dependencies.
We do have a flexibility to change this in pygradle, though, very easily. I'm currently busy with something and want to test it before replying to @busche. I'll have something here this evening (Pacific time).
Hi @zvezdan,
do you have any updates on this issue? Maybe I can lend a hand in testing something?
Best, André
@busche You can put the dependencies in the order you want to enforce in build.gradle:
dependencies {
// ...
python 'pypi:scipy:0.18.1'
// ...
python 'pypi:scikit-learn:0.18'
// ...
}
Then add (probably before this section):
project.tasks.findByName('installPythonRequirements').sorted = false
If you want to avoid depending on a specific internal name, you can use this instead:
import com.linkedin.gradle.python.plugin.PythonPlugin
project.tasks.findByName(PythonPlugin.TASK_INSTALL_PYTHON_REQS).sorted = false
That will avoid sorting the dependencies before the install and install them in the order they appear in the direct dependencies closure.
Short feedback from my side: It works. Within the next days, I will have a small write-up on this.
I've added the example as PR #87 - I hope this helps the others to get started.