deepforge icon indicating copy to clipboard operation
deepforge copied to clipboard

Alternatives to updating conda environment on Job Execution

Open umesh-timalsina opened this issue 5 years ago • 2 comments

Currently, on Job Execution(If dependencies are specified), we clone the base environment which is just a bunch of copy and move operations and the next step is to update the cloned environment with the new dependencies. Primitive inspection (via console.time) shows that it takes the longest.

One alternative would be check environment file as follows:

  1. Check the python version (to be python 3.7)
  2. Check if only pip installed dependencies available

If the case above, we could just install the dependencies using pip.

The following benchmarks show that installing numpy and pandas using pip is significantly faster than waiting for conda to resolve the environment.

dependencies:
  - pip:
    - numpy

Example:

(base) umesh@isisdell:~$ time conda run -n deepforge-copy pip install numpy pandas
real	0m4.185s
user	0m2.503s
sys	0m0.348s
(base) umesh@isisdell:~$ time conda run -n deepforge-copy pip uninstall numpy --yes
real	0m0.802s
user	0m0.677s
sys	0m0.125s
(base) umesh@isisdell:~$ time conda env update -n deepforge-copy  --file update-file.yml
real	0m19.691s
user	0m17.325s
sys	0m1.232s

umesh-timalsina avatar Jul 09 '20 17:07 umesh-timalsina

Probably a good idea given the prevalence of pip dependencies. Kinda annoying to introduce a special case optimization like this though :(

brollb avatar Jul 09 '20 18:07 brollb

https://www.anaconda.com/blog/understanding-and-improving-condas-performance Has some ideas on improving conda's performance. Very few apply to us

umesh-timalsina avatar Jul 13 '20 14:07 umesh-timalsina