~2x speedup using numba decorator in one single place
Thanks a lot for this pure python port of potrace!
Looking at cProfile results, I saw that findnext() is a performance hotspot. By introducing numba, and adding a single numba.njit() decorator to findnext, a 2x speedup is achieved. I am sure there is a lot more possible, this is just the lowest of the low hanging fruits.
I did not touch the default installation and only added numba as an optional dependency. Users with the need for more performance can install using pip install potracer[numba]. I updated the Readme.md accordingly.
Profiling results: I used this image to test: link, and was able to reduce the runtime from ~60 seconds to ~26 seconds.
Before:
44505289 function calls (43271614 primitive calls) in 61.255 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
154/1 0.000 0.000 61.266 61.266 {built-in method builtins.exec}
1 0.069 0.069 61.266 61.266 test.py:1(<module>)
1 0.076 0.076 61.069 61.069 test.py:6(file_to_svg)
1 0.000 0.000 60.926 60.926 potrace.py:39(trace)
1 0.020 0.020 46.160 46.160 potrace.py:810(bm_to_pathlist)
3993 2.480 0.001 44.978 0.011 potrace.py:644(findnext) <-- ~45 seconds spent here
3993 0.006 0.000 42.493 0.011 fromnumeric.py:1881(nonzero)
3994 0.008 0.000 42.487 0.011 fromnumeric.py:53(_wrapfunc)
3993 42.478 0.011 42.478 0.011 {method 'nonzero' of 'numpy.ndarray' objects}
1 0.021 0.021 14.734 14.734 potrace.py:1921(process_path)
2988 7.397 0.002 10.508 0.004 potrace.py:1169(_calc_lon)
2988 0.513 0.000 2.736 0.001 potrace.py:1348(_bestpolygon)
1185378 1.745 0.000 2.159 0.000 potrace.py:1305(penalty3)
15840196 1.517 0.000 1.517 0.000 potrace.py:1007(xprod)
3992 0.680 0.000 0.926 0.000 potrace.py:570(findpath)
8628578 0.773 0.000 0.773 0.000 potrace.py:853(sign)
...
After adding @numba.njit() to findnext():
45923494 function calls (44621093 primitive calls) in 26.091 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
481/1 0.002 0.000 26.103 26.103 {built-in method builtins.exec}
1 0.064 0.064 26.103 26.103 test.py:1(<module>)
1 0.071 0.071 25.788 25.788 test.py:6(file_to_svg)
1 0.000 0.000 25.650 25.650 potrace.py:40(trace)
1 0.020 0.020 14.359 14.359 potrace.py:1923(process_path)
1 0.026 0.026 11.139 11.139 potrace.py:812(bm_to_pathlist)
2988 7.186 0.002 10.198 0.003 potrace.py:1171(_calc_lon)
3993 9.534 0.002 9.534 0.002 potrace.py:645(findnext) <-- ~10 seconds spent here
2988 0.493 0.000 2.675 0.001 potrace.py:1350(_bestpolygon)
1185378 1.709 0.000 2.120 0.000 potrace.py:1307(penalty3)
15840196 1.466 0.000 1.466 0.000 potrace.py:1009(xprod)
49 0.001 0.000 0.800 0.016 __init__.py:1(<module>)
3992 0.569 0.000 0.787 0.000 potrace.py:571(findpath)
2988 0.334 0.000 0.767 0.000 potrace.py:1143(_calc_sums)
8628578 0.759 0.000 0.759 0.000 potrace.py:855(sign)
...
I probably won't be adding anything that is at all depending on anything else. The main point of the port was that I couldn't get the windows python port to work at all. The speed there is a sacrifice, but vendoring with any speed bonuses you find is likely going to be worth while where speed is an issue.
Thank you for the quick reply!
In the meantime I have discovered vtracer - which is even faster and offers nice python bindings (pip install vtracer).
Totally understand your hesitation to add any extra dependencies, that's why in this PR I made it completely optional. I just thought I'd add my profiling results and performance improvement as PR in case anybody wants to take this pure python approach further.