potrace ~2x speedup using numba decorator in one single place

Thanks a lot for this pure python port of potrace!

Looking at cProfile results, I saw that findnext() is a performance hotspot. By introducing numba, and adding a single numba.njit() decorator to findnext, a 2x speedup is achieved. I am sure there is a lot more possible, this is just the lowest of the low hanging fruits.

I did not touch the default installation and only added numba as an optional dependency. Users with the need for more performance can install using pip install potracer[numba]. I updated the Readme.md accordingly.

Profiling results: I used this image to test: link, and was able to reduce the runtime from ~60 seconds to ~26 seconds.

Before:

44505289 function calls (43271614 primitive calls) in 61.255 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    154/1    0.000    0.000   61.266   61.266 {built-in method builtins.exec}
        1    0.069    0.069   61.266   61.266 test.py:1(<module>)
        1    0.076    0.076   61.069   61.069 test.py:6(file_to_svg)
        1    0.000    0.000   60.926   60.926 potrace.py:39(trace)
        1    0.020    0.020   46.160   46.160 potrace.py:810(bm_to_pathlist)
     3993    2.480    0.001   44.978    0.011 potrace.py:644(findnext)            <-- ~45 seconds spent here
     3993    0.006    0.000   42.493    0.011 fromnumeric.py:1881(nonzero)
     3994    0.008    0.000   42.487    0.011 fromnumeric.py:53(_wrapfunc)
     3993   42.478    0.011   42.478    0.011 {method 'nonzero' of 'numpy.ndarray' objects}
        1    0.021    0.021   14.734   14.734 potrace.py:1921(process_path)
     2988    7.397    0.002   10.508    0.004 potrace.py:1169(_calc_lon)
     2988    0.513    0.000    2.736    0.001 potrace.py:1348(_bestpolygon)
  1185378    1.745    0.000    2.159    0.000 potrace.py:1305(penalty3)
 15840196    1.517    0.000    1.517    0.000 potrace.py:1007(xprod)
     3992    0.680    0.000    0.926    0.000 potrace.py:570(findpath)
  8628578    0.773    0.000    0.773    0.000 potrace.py:853(sign)
  ...

After adding @numba.njit() to findnext():

45923494 function calls (44621093 primitive calls) in 26.091 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    481/1    0.002    0.000   26.103   26.103 {built-in method builtins.exec}
        1    0.064    0.064   26.103   26.103 test.py:1(<module>)
        1    0.071    0.071   25.788   25.788 test.py:6(file_to_svg)
        1    0.000    0.000   25.650   25.650 potrace.py:40(trace)
        1    0.020    0.020   14.359   14.359 potrace.py:1923(process_path)
        1    0.026    0.026   11.139   11.139 potrace.py:812(bm_to_pathlist)
     2988    7.186    0.002   10.198    0.003 potrace.py:1171(_calc_lon)
     3993    9.534    0.002    9.534    0.002 potrace.py:645(findnext)            <-- ~10 seconds spent here
     2988    0.493    0.000    2.675    0.001 potrace.py:1350(_bestpolygon)
  1185378    1.709    0.000    2.120    0.000 potrace.py:1307(penalty3)
 15840196    1.466    0.000    1.466    0.000 potrace.py:1009(xprod)
       49    0.001    0.000    0.800    0.016 __init__.py:1(<module>)
     3992    0.569    0.000    0.787    0.000 potrace.py:571(findpath)
     2988    0.334    0.000    0.767    0.000 potrace.py:1143(_calc_sums)
  8628578    0.759    0.000    0.759    0.000 potrace.py:855(sign)
  ...

May 09 '24 08:05 baurst

I probably won't be adding anything that is at all depending on anything else. The main point of the port was that I couldn't get the windows python port to work at all. The speed there is a sacrifice, but vendoring with any speed bonuses you find is likely going to be worth while where speed is an issue.

May 13 '24 00:05 tatarize

Thank you for the quick reply!

In the meantime I have discovered vtracer - which is even faster and offers nice python bindings (pip install vtracer).

Totally understand your hesitation to add any extra dependencies, that's why in this PR I made it completely optional. I just thought I'd add my profiling results and performance improvement as PR in case anybody wants to take this pure python approach further.

May 13 '24 12:05 baurst