BrainPy Performance difference between using "brainpy.math.for_loop" and "model.jit_step

In the documentation of monitor every multiple steps, two methods are provided. One using brainpy.math.for_loop and the other using model.jit_step_run. I have profiled the running speed of the given two examples, and find that model.jit_step_run consistently runs faster than brainpy.math.for_loop (at least on my platform, on both CPU and GPU).

I am a bit surprised by the result, since using model.jit_step_run requires writing explicit python for-loop, which I think should be slow. What might be reason behind the performance difference?

Profile code:

import time
import numpy as np
import matplotlib.pyplot as plt

import brainpy as bp
import brainpy.math as bm

bm.set_platform('cpu')

#%%
class EINet(bp.DynSysGroup):
    def __init__(self):
        super().__init__()
        self.N = bp.dyn.LifRef(4000, V_rest=-60., V_th=-50., V_reset=-60., tau=20., tau_ref=5.,
                               V_initializer=bp.init.Normal(-55., 2.))
        self.delay = bp.VarDelay(self.N.spike, entries={'I': None})
        self.E = bp.dyn.ProjAlignPostMg1(comm=bp.dnn.EventJitFPHomoLinear(3200, 4000, prob=0.02, weight=0.6),
                                         syn=bp.dyn.Expon.desc(size=4000, tau=5.),
                                         out=bp.dyn.COBA.desc(E=0.),
                                         post=self.N)
        self.I = bp.dyn.ProjAlignPostMg1(comm=bp.dnn.EventJitFPHomoLinear(800, 4000, prob=0.02, weight=6.7),
                                         syn=bp.dyn.Expon.desc(size=4000, tau=10.),
                                         out=bp.dyn.COBA.desc(E=-80.),
                                         post=self.N)
    
    def update(self, input):
        spk = self.delay.at('I')
        self.E(spk[:3200])
        self.I(spk[3200:])
        self.delay(self.N(input))
        return self.N.spike.value
    
    def run(self, ids, inputs):  # the most import function!!!
        for i, inp in zip(ids, inputs):
            bp.share.save(i=i, t=bm.get_dt() * i)
            self.update(inp)
        return self.N.spike.value

#%% brainpy.math.for_loop
n_step_per_monitor = 10
indices1 = np.arange(10000).reshape(-1, n_step_per_monitor)
inputs1 = np.ones_like(indices1) * 20.0

model = EINet()

start_time = time.time()
spks1 = bm.for_loop(model.run, (indices1, inputs1), progress_bar=False)
end_time = time.time()
print('{:.2f} seconds'.format(end_time - start_time))

spks1 = bm.as_numpy(spks1)

plt.figure()
bp.visualize.raster_plot(indices1[:,0], spks1, show=True)

#%% brainpy.math.jit
n_step_per_monitor = 10
indices2 = np.arange(10000)
inputs2 = np.ones_like(indices2) * 20.

model = EINet()

spks2 = []

start_time = time.time()
for i in indices2:
    model.jit_step_run(i, inputs2[i])

    if i % n_step_per_monitor == 0:  
        spks2.append(model.N.spike.value)  # monitor spikes every time

end_time = time.time()
print('{:.2f} seconds'.format(end_time - start_time))

spks2 = bm.as_numpy(spks2)

plt.figure()
bp.visualize.raster_plot(indices2[::n_step_per_monitor], spks2, show=True)

Outputs:

1.96 seconds
1.01 seconds

Even if I reverse the order of the two methods, the result are almost the same, so the difference is not caused by the JIT compilation time during the first run.

Nov 28 '23 09:11 CloudyDory

I guess the difference lies in the compilation time of brainpy.math.for_loop. But I will perform more experiments to see what's going on for such a difference.

Nov 29 '23 02:11 chaoming0625

Actually, brainpy.math.for_loop will be faster if we increase the simulation time steps from 10000 to 100000.

Jan 29 '24 04:01 CloudyDory

Performance difference between using "brainpy.math.for_loop" and "model.jit_step_run"