BrainPy icon indicating copy to clipboard operation
BrainPy copied to clipboard

Performance difference between using "brainpy.math.for_loop" and "model.jit_step_run"

Open CloudyDory opened this issue 2 years ago • 2 comments

In the documentation of monitor every multiple steps, two methods are provided. One using brainpy.math.for_loop and the other using model.jit_step_run. I have profiled the running speed of the given two examples, and find that model.jit_step_run consistently runs faster than brainpy.math.for_loop (at least on my platform, on both CPU and GPU).

I am a bit surprised by the result, since using model.jit_step_run requires writing explicit python for-loop, which I think should be slow. What might be reason behind the performance difference?

Profile code:

import time
import numpy as np
import matplotlib.pyplot as plt

import brainpy as bp
import brainpy.math as bm

bm.set_platform('cpu')

#%%
class EINet(bp.DynSysGroup):
    def __init__(self):
        super().__init__()
        self.N = bp.dyn.LifRef(4000, V_rest=-60., V_th=-50., V_reset=-60., tau=20., tau_ref=5.,
                               V_initializer=bp.init.Normal(-55., 2.))
        self.delay = bp.VarDelay(self.N.spike, entries={'I': None})
        self.E = bp.dyn.ProjAlignPostMg1(comm=bp.dnn.EventJitFPHomoLinear(3200, 4000, prob=0.02, weight=0.6),
                                         syn=bp.dyn.Expon.desc(size=4000, tau=5.),
                                         out=bp.dyn.COBA.desc(E=0.),
                                         post=self.N)
        self.I = bp.dyn.ProjAlignPostMg1(comm=bp.dnn.EventJitFPHomoLinear(800, 4000, prob=0.02, weight=6.7),
                                         syn=bp.dyn.Expon.desc(size=4000, tau=10.),
                                         out=bp.dyn.COBA.desc(E=-80.),
                                         post=self.N)
    
    def update(self, input):
        spk = self.delay.at('I')
        self.E(spk[:3200])
        self.I(spk[3200:])
        self.delay(self.N(input))
        return self.N.spike.value
    
    def run(self, ids, inputs):  # the most import function!!!
        for i, inp in zip(ids, inputs):
            bp.share.save(i=i, t=bm.get_dt() * i)
            self.update(inp)
        return self.N.spike.value

#%% brainpy.math.for_loop
n_step_per_monitor = 10
indices1 = np.arange(10000).reshape(-1, n_step_per_monitor)
inputs1 = np.ones_like(indices1) * 20.0

model = EINet()

start_time = time.time()
spks1 = bm.for_loop(model.run, (indices1, inputs1), progress_bar=False)
end_time = time.time()
print('{:.2f} seconds'.format(end_time - start_time))

spks1 = bm.as_numpy(spks1)

plt.figure()
bp.visualize.raster_plot(indices1[:,0], spks1, show=True)

#%% brainpy.math.jit
n_step_per_monitor = 10
indices2 = np.arange(10000)
inputs2 = np.ones_like(indices2) * 20.

model = EINet()

spks2 = []

start_time = time.time()
for i in indices2:
    model.jit_step_run(i, inputs2[i])

    if i % n_step_per_monitor == 0:  
        spks2.append(model.N.spike.value)  # monitor spikes every time

end_time = time.time()
print('{:.2f} seconds'.format(end_time - start_time))

spks2 = bm.as_numpy(spks2)

plt.figure()
bp.visualize.raster_plot(indices2[::n_step_per_monitor], spks2, show=True)

Outputs:

1.96 seconds
1.01 seconds

Even if I reverse the order of the two methods, the result are almost the same, so the difference is not caused by the JIT compilation time during the first run.

CloudyDory avatar Nov 28 '23 09:11 CloudyDory

I guess the difference lies in the compilation time of brainpy.math.for_loop. But I will perform more experiments to see what's going on for such a difference.

chaoming0625 avatar Nov 29 '23 02:11 chaoming0625

Actually, brainpy.math.for_loop will be faster if we increase the simulation time steps from 10000 to 100000.

CloudyDory avatar Jan 29 '24 04:01 CloudyDory