Get module load time down
Currently about 7 seconds, 80% compilation
Now on 3.5 seconds, 70% compilation, by using a custom meta format instead of JSON.
Now 2.5 seconds, 80% compilation, I think from lowering the optimisation level in CondaPkg and MicroMamba.
I get:
julia> @time using PythonCall
6.009497 seconds (4.98 M allocations: 296.350 MiB, 1.55% gc time, 89.52% compilation time)
still the package and CondaPkg and MicroMamba are at the latest version (didn't check app dependencies). Your machine simply that much faster?
I get down to 1 sec with --compile=min (I know you don't want that, understandable).
This is what I get with 1.8.3 on linux:
julia> @time using PythonCall
8.890879 seconds (7.62 M allocations: 441.547 MiB, 5.14% gc time, 88.10% compilation time: 72% of which was recompilation)
Not a blocker for transitioning Plotss matplotlib backend to PythonPlot in https://github.com/JuliaPlots/Plots.jl/pull/4542 (see https://github.com/JuliaPlots/Plots.jl/pull/4542#issuecomment-1326595934), but reducing load time to reach parity with PyPlot / PyCall would be nice, since I intent to make the pythonplot backend supersede the pyplot one.
I'm not at the computer but I think for me the import time is currently about 3.5 seconds - so it's gone up, but indeed still faster than your computer apparently. I'm on Windows and SSD..?
Load time is low on my personal priority list, but I investigate it now and then. Help investigating the slow bits would be appreciated.
linux, ssd too:
julia> versioninfo()
Julia Version 1.8.3
Commit 0434deb161e (2022-11-14 20:14 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 32 × Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, sandybridge)
Threads: 1 on 32 virtual cores
Wait, I think the previous timings were wrong, I might have had a JULIA_PYTHONCALL_EXE=@PyCall in my environment :/
With JULIA_PYTHONCALL_EXE=@PyCall:
julia> @time @time_imports using PythonCall
53.7 ms MacroTools
0.2 ms DataValueInterfaces
1.7 ms DataAPI
0.1 ms IteratorInterfaceExtensions
0.1 ms TableTraits
11.6 ms OrderedCollections
21.6 ms Tables
0.4 ms Requires
0.3 ms Scratch
35.7 ms Preferences
0.5 ms JLLWrappers
5.5 ms micromamba_jll 90.29% compilation time
0.3 ms MicroMamba
0.1 ms SnoopPrecompile
87.6 ms Parsers 12.07% compilation time
14.6 ms StructTypes
83.4 ms JSON3
0.4 ms Pidfile
21.6 ms CondaPkg 87.25% compilation time
14.9 ms UnsafePointers
19.9 ms VersionParsing
24.7 ms JSON
9.0 ms Conda
945.2 ms PyCall 78.96% compilation time (6% recompilation)
8548.2 ms PythonCall 92.25% compilation time (74% recompilation)
9.182004 seconds (8.43 M allocations: 482.731 MiB, 5.13% gc time, 88.68% compilation time: 71% of which was recompilation)
With JULIA_PYTHONCALL_EXE=[...]/bin/python:
Now, closer to your timings (my Xeon CPU is not recent):
julia> @time @time_imports using PythonCall
42.9 ms MacroTools
0.1 ms DataValueInterfaces
1.4 ms DataAPI
0.1 ms IteratorInterfaceExtensions
0.1 ms TableTraits
9.2 ms OrderedCollections
16.8 ms Tables
0.3 ms Requires
0.2 ms Scratch
27.9 ms Preferences
0.4 ms JLLWrappers
4.0 ms micromamba_jll 87.60% compilation time
0.3 ms MicroMamba
0.1 ms SnoopPrecompile
90.1 ms Parsers 12.72% compilation time
14.4 ms StructTypes
83.8 ms JSON3
0.5 ms Pidfile
21.7 ms CondaPkg 87.59% compilation time
14.9 ms UnsafePointers
3341.8 ms PythonCall 85.17% compilation time (<1% recompilation)
3.905634 seconds (3.16 M allocations: 210.015 MiB, 2.20% gc time, 78.55% compilation time: 0% of which was recompilation)
This is a bit peculiar because the numbers do not just add, since the impact of loading PyCall is around 1.5s (all measurements made in a new session, of course):
julia> @time @time_imports using PyCall
23.0 ms VersionParsing
0.2 ms SnoopPrecompile
105.4 ms Parsers 12.46% compilation time
22.7 ms JSON
8.2 ms Conda
38.6 ms MacroTools
894.5 ms PyCall 80.50% compilation time
1.360677 seconds (1.69 M allocations: 127.477 MiB, 3.09% gc time, 70.89% compilation time: 1% of which was recompilation)
We expect about 3.9s + 1.4s ~= 5.3s, not 9.2s. Maybe some invalidations occurring ?
Ah interesting. Maybe invalidations yeah, not sure what though.
Or maybe it's Requires.jl, which is used to check compatibility with PyCall when it's loaded - but will probably require compilation when used.
Maybe Requires yes, since we use it also in Plots to load glue code for each backend.
The following is disappointing: without setting JULIA_PYTHONCALL_EXE=@PyCall, the timings in https://github.com/JuliaPlots/Plots.jl/pull/4542#issuecomment-1326595934 remain very similar, with a ~6s overhead for pythonplot w.r.t. pyplot.
I'll try to investigate.
EDIT: Ok, so PyCall is still loaded, and that is weird, because it shouldn't be.
julia> using Plots
julia> @time pythonplot()
75.2 ms MacroTools
0.2 ms DataValueInterfaces
0.2 ms IteratorInterfaceExtensions
0.1 ms TableTraits
37.8 ms Tables
0.5 ms micromamba_jll
0.3 ms MicroMamba
23.0 ms StructTypes
161.1 ms JSON3
0.5 ms Pidfile
83.3 ms CondaPkg 96.11% compilation time (78% recompilation)
25.9 ms UnsafePointers
4636.6 ms PythonCall 84.21% compilation time (32% recompilation)
17.3 ms VersionParsing
9.5 ms Conda
979.3 ms PyCall 70.39% compilation time (6% recompilation)
1.5 ms LaTeXStrings
4092.8 ms PythonPlot 77.86% compilation time (36% recompilation)
22.286516 seconds (23.90 M allocations: 1.338 GiB, 3.07% gc time, 86.22% compilation time: 43% of which was recompilation)
Narrowed down to PythonPlot loading PyCall (but again why ??):
@time @time_imports using PythonPlot
49.9 ms MacroTools
0.2 ms DataValueInterfaces
2.3 ms DataAPI
0.1 ms IteratorInterfaceExtensions
0.1 ms TableTraits
23.1 ms Tables
0.4 ms Requires
0.3 ms Scratch
35.7 ms Preferences
0.5 ms JLLWrappers
4.8 ms micromamba_jll 88.75% compilation time
0.3 ms MicroMamba
0.1 ms SnoopPrecompile
93.4 ms Parsers 11.61% compilation time
16.2 ms StructTypes
233.8 ms JSON3
0.5 ms Pidfile
22.0 ms CondaPkg 86.64% compilation time
17.2 ms UnsafePointers
3400.1 ms PythonCall 86.29% compilation time (<1% recompilation)
15.6 ms VersionParsing
21.9 ms JSON
8.7 ms Conda
880.6 ms PyCall 77.36% compilation time (6% recompilation)
94.8 ms FixedPointNumbers
0.2 ms Reexport
118.4 ms ColorTypes 6.93% compilation time
366.3 ms Colors
1.1 ms LaTeXStrings
4195.4 ms PythonPlot 79.97% compilation time (5% recompilation)
13.508075 seconds (13.46 M allocations: 753.768 MiB, 5.51% gc time, 80.11% compilation time: 34% of which was recompilation)
A question: why would you want to @require PyCall="438e738f-606a-5dbb-bf0a-cddfbfd45ab0" init_pycall(PyCall), if JULIA_PYTHONCALL_EXE is explicitly set by the user ?
My bad, probably a stale precompilation cache:
julia> using PythonCall, PythonPlot
julia> rm.(Base.find_all_in_cache_path(Base.module_keys[PythonCall]));
julia> rm.(Base.find_all_in_cache_path(Base.module_keys[PythonPlot]));
# restart
julia> using PythonPlot # precompiles
# restart
julia> using Plots
julia> @time pythonplot();
16.612395 seconds (15.54 M allocations: 944.050 MiB, 2.86% gc time, 84.67% compilation time: 31% of which was recompilation)
Parity with pyplot (~16.1s) :tada: .
I just downloaded master Julia 1.10, but it's "2 days old master". I think it may be too old to test, I believe recently merged important PR is missing in my master. I would test with the next updated master, and then hopefully the problem will be fixed to a large degree, so not as urgent to work on this.
Anyway I get 5.977690 seconds on my (somewhat loaded) machine.
Getting better down to:
julia> @time using PythonCall
2.297766 seconds (1.46 M allocations: 91.242 MiB, 5.38% gc time, 1.60% compilation time)
You can get 1.81 sec. with -O1, which I think is ok to set at the module level, or even lower with --compile=min then 1.332 sec but you likely don't want to set that at module level.
julia> @time_imports using PythonCall # these timings always show lower, assuming is correct and includes time for its dependencies, and note, it will be even lower with julia -O1, what I' showing here:
17.8 ms MacroTools
0.2 ms DataValueInterfaces
1.3 ms DataAPI
0.2 ms IteratorInterfaceExtensions
0.1 ms TableTraits
9.8 ms OrderedCollections
39.6 ms Tables
0.4 ms Requires
0.3 ms Scratch
17.4 ms Preferences
0.4 ms JLLWrappers
2.6 ms micromamba_jll 83.14% compilation time
0.3 ms MicroMamba
0.2 ms SnoopPrecompile
77.5 ms Parsers
10.5 ms StructTypes
77.3 ms JSON3
0.4 ms Pidfile
18.6 ms CondaPkg 33.29% compilation time
16.1 ms UnsafePointers
1396.3 ms PythonCall
Would adding weakdep feature to Projects.toml possibly help?