pyElli icon indicating copy to clipboard operation
pyElli copied to clipboard

Expm is getting slow in `scipy==1.13.1`

Open domna opened this issue 1 year ago • 11 comments

With ~numpy==2.0.0~scipy==1.13.1 the performance of the solver drops tremendously. We should investigate where this is coming from and see if we can fix this by replacing to the proper (quicker) functions.

domna avatar Jun 19 '24 06:06 domna

I will take a look, I should have a profiling notebook somewhere.

MarJMue avatar Jun 19 '24 09:06 MarJMue

Interestingly if i run the benchmark locally (Fedora, Python 3.10 and 3.12) I don't get any performance regression. Maybe it's a Problem with the runner.

------------------------------------------------------------
Name (time in ms)          OPS(2.0.0)       OPS(1.26.4)     
------------------------------------------------------------
test_solver2x2           513.6902 (1.0)   422.0736 (1.0)    
test_solver4x4_linear    379.5551 (0.74)  292.9602 (0.69)   
test_solver4x4_eig        65.4104 (0.13)   49.6542 (0.12)   
test_solver4x4_expm        3.5880 (0.01)    3.5572 (0.01)   
------------------------------------------------------------

MarJMue avatar Jun 19 '24 12:06 MarJMue

I see issues locally, too

numpy 2

------------------------------------------------------------------------------------------ benchmark: 6 tests -----------------------------------------------------------------------------------------
Name (time in ms)                    Min                 Max                Mean              StdDev              Median                IQR            Outliers       OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_solver2x2                    3.3913 (1.0)       11.2985 (1.30)       4.6075 (1.0)        2.4295 (3.04)       3.6041 (1.0)       1.4018 (4.11)          1;1  217.0365 (1.0)          10           1
test_solver4x4_linear             3.7497 (1.11)       8.9823 (1.04)       4.7736 (1.04)       1.6175 (2.02)       4.0946 (1.14)      0.9789 (2.87)          1;1  209.4871 (0.97)         10           1
test_formula_solver2x2            5.9750 (1.76)       8.6650 (1.0)        6.4390 (1.40)       0.8001 (1.0)        6.2300 (1.73)      0.3410 (1.0)           1;1  155.3033 (0.72)         10           1
test_solver4x4_eig               42.4544 (12.52)     67.9676 (7.84)      47.8688 (10.39)      8.3157 (10.39)     44.0447 (12.22)     6.4657 (18.96)         2;1   20.8905 (0.10)         10           1
test_solver4x4_expm             449.6212 (132.58)   540.5948 (62.39)    471.5749 (102.35)    28.4518 (35.56)    462.2354 (128.25)   20.4840 (60.07)         1;1    2.1206 (0.01)         10           1
test_formula_solver4x4_expm     457.4233 (134.88)   800.8481 (92.42)    506.0184 (109.82)   104.2926 (130.35)   477.2739 (132.43)   24.4018 (71.56)         1;1    1.9762 (0.01)         10           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

numpy 1.x

---------------------------------------------------------------------------------------- benchmark: 6 tests ----------------------------------------------------------------------------------------
Name (time in ms)                    Min                 Max                Mean            StdDev              Median               IQR            Outliers       OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_solver2x2                    3.7060 (1.0)        5.8486 (1.0)        4.0890 (1.0)      0.6535 (1.06)       3.8295 (1.0)      0.4599 (1.05)          1;1  244.5583 (1.0)          10           1
test_solver4x4_linear             4.3303 (1.17)       5.9519 (1.02)       4.9078 (1.20)     0.6175 (1.0)        4.6249 (1.21)     1.2390 (2.83)          3;0  203.7588 (0.83)         10           1
test_formula_solver2x2            5.9571 (1.61)       8.4977 (1.45)       6.4565 (1.58)     0.7682 (1.24)       6.1007 (1.59)     0.4382 (1.0)           1;1  154.8831 (0.63)         10           1
test_solver4x4_eig              114.9540 (31.02)    132.4588 (22.65)    118.5942 (29.00)    6.1394 (9.94)     115.7738 (30.23)    1.1952 (2.73)          2;2    8.4321 (0.03)         10           1
test_solver4x4_expm             393.4410 (106.16)   413.0689 (70.63)    403.1043 (98.58)    5.6653 (9.17)     402.3795 (105.07)   8.5332 (19.47)         2;0    2.4807 (0.01)         10           1
test_formula_solver4x4_expm     397.5010 (107.26)   410.9472 (70.26)    402.0846 (98.33)    4.2841 (6.94)     401.0340 (104.72)   3.6075 (8.23)          3;1    2.4870 (0.01)         10           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

But mainly for the eigen value solver

domna avatar Jun 19 '24 12:06 domna

The eigenvalue solver is getting faster! In the CI benchmark, the Expm broke down by a factor of 4, which we both can not reproduce.

Edit: The culprit is SciPy. The regression was introduced in between 1.12 and 1.13.

MarJMue avatar Jun 19 '24 13:06 MarJMue

Oopsie, true. Sorry, just quickly checked on the side today and I just saw that something changed. I can confirm that scipy==0.12 is faster by a factor of 4 for me, too.

Here are the stats with numpy-1.26.4 and scipy-1.12.0:

----------------------------------------------------------------------------------------- benchmark: 6 tests -----------------------------------------------------------------------------------------
Name (time in ms)                    Min                 Max                Mean             StdDev              Median                IQR            Outliers       OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_solver2x2                    3.6657 (1.0)        4.1063 (1.0)        3.7916 (1.0)       0.1357 (1.0)        3.7635 (1.0)       0.1749 (1.18)          1;0  263.7438 (1.0)          10           1
test_solver4x4_linear             4.4456 (1.21)       5.0880 (1.24)       4.6169 (1.22)      0.1938 (1.43)       4.5882 (1.22)      0.1486 (1.0)           1;1  216.5961 (0.82)         10           1
test_formula_solver2x2            5.8515 (1.60)       9.1948 (2.24)       6.4353 (1.70)      1.0004 (7.37)       6.0991 (1.62)      0.4927 (3.32)          1;1  155.3918 (0.59)         10           1
test_solver4x4_expm              81.0214 (22.10)     99.6861 (24.28)     87.8797 (23.18)     6.2004 (45.69)     84.9936 (22.58)     8.0749 (54.35)         3;0   11.3792 (0.04)         10           1
test_formula_solver4x4_expm      82.5994 (22.53)    197.2490 (48.04)    105.7262 (27.88)    34.1210 (251.45)    94.7500 (25.18)    28.3755 (190.97)        1;1    9.4584 (0.04)         10           1
test_solver4x4_eig              115.4885 (31.51)    126.0902 (30.71)    117.8930 (31.09)     3.9314 (28.97)    116.0587 (30.84)     0.9391 (6.32)          2;2    8.4823 (0.03)         10           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

I did not really find anything on a major change of expm in scipy 1.13 but there are some bugfixes on it in the release notes. Maybe we should build a minimal example just for expm and open an issue at scipy with it.

domna avatar Jun 19 '24 19:06 domna

Indeed when I run this

import time

import numpy as np
from scipy.linalg import expm

mat = np.array(
    [
        [0.0 - 0.0j, 0.0 - 0.0j, 0.0 - 0.0j, 0.0 - 2.5774387j],
        [0.0 - 0.0j, 0.0 - 0.0j, 0.0 + 4.3402231j, 0.0 - 0.0j],
        [0.0 - 0.0j, 0.0 + 5.60367362j, 0.0 - 0.0j, 0.0 - 0.0j],
        [0.0 - 9.43618706j, 0.0 - 0.0j, 0.0 - 0.0j, 0.0 - 0.0j],
    ]
)

start_time = time.time()
for _ in range(10000):
    expm(mat)
print(f"{(time.time() - start_time) * 1000} ms")

I get ~360 ms for scipy < 1.13 and 1800 ms for scipy >=1.13.

domna avatar Jun 19 '24 20:06 domna

hmm... I had a look at the changes of expm between v1.13.1 and v1.12.0 and the only thing that changes was that there was an explicit formula for 2x2 matrices which has been removed. So there must be a change in some other function expm calls which slows it down.

domna avatar Jun 20 '24 06:06 domna

Yeah, it's not directly obvious what they changed. We can try to report the regression.

On the other hand, I did try out PyTorch again. They have improved performance massively. For me, it is comparable to the eigenvalue solver. Maybe we should make it an alternative again?

-------------------------------------------------
Name (time in ms)                     OPS           
-------------------------------------------------
test_solver2x2                   523.0784 (1.0)     
test_solver4x4_linear            476.9422 (0.91)    
test_formula_solver2x2           279.7011 (0.53)    
test_solver4x4_torch_expm         63.3393 (0.12)    
test_solver4x4_eig                67.1695 (0.13)    
test_solver4x4_expm                4.3523 (0.01)    
test_formula_solver4x4_expm        5.7128 (0.01)    
-------------------------------------------------

MarJMue avatar Jun 20 '24 08:06 MarJMue

Yeah, it's not directly obvious what they changed.

On the other hand, I did try out PyTorch again. They have improved performance massively. For me, it is comparable to the eigenvalue solver. Maybe we should make it an alternative again?

Yes, sounds good to have an alternative. However, I also tried to use torch and ran the above example and got even worse performance (~2500 ms I think it was). Can you share your code and libs? Then I'll try to reproduce it.

I would also open an issue at scipy to start a discussion there at least. Maybe someone there might just know what's going wrong.

domna avatar Jun 20 '24 08:06 domna

I have prepared a branch with Pytorch at #181

MarJMue avatar Jun 20 '24 08:06 MarJMue

https://github.com/scipy/scipy/issues/21078

domna avatar Jun 29 '24 07:06 domna

Fixed in Scipy 1.15

MarJMue avatar Jan 04 '25 14:01 MarJMue