Performance difference between using "brainpy.math.for_loop" and "model.jit_step_run"
In the documentation of monitor every multiple steps, two methods are provided. One using brainpy.math.for_loop and the other using model.jit_step_run. I have profiled the running speed of the given two examples, and find that model.jit_step_run consistently runs faster than brainpy.math.for_loop (at least on my platform, on both CPU and GPU).
I am a bit surprised by the result, since using model.jit_step_run requires writing explicit python for-loop, which I think should be slow. What might be reason behind the performance difference?
Profile code:
import time
import numpy as np
import matplotlib.pyplot as plt
import brainpy as bp
import brainpy.math as bm
bm.set_platform('cpu')
#%%
class EINet(bp.DynSysGroup):
def __init__(self):
super().__init__()
self.N = bp.dyn.LifRef(4000, V_rest=-60., V_th=-50., V_reset=-60., tau=20., tau_ref=5.,
V_initializer=bp.init.Normal(-55., 2.))
self.delay = bp.VarDelay(self.N.spike, entries={'I': None})
self.E = bp.dyn.ProjAlignPostMg1(comm=bp.dnn.EventJitFPHomoLinear(3200, 4000, prob=0.02, weight=0.6),
syn=bp.dyn.Expon.desc(size=4000, tau=5.),
out=bp.dyn.COBA.desc(E=0.),
post=self.N)
self.I = bp.dyn.ProjAlignPostMg1(comm=bp.dnn.EventJitFPHomoLinear(800, 4000, prob=0.02, weight=6.7),
syn=bp.dyn.Expon.desc(size=4000, tau=10.),
out=bp.dyn.COBA.desc(E=-80.),
post=self.N)
def update(self, input):
spk = self.delay.at('I')
self.E(spk[:3200])
self.I(spk[3200:])
self.delay(self.N(input))
return self.N.spike.value
def run(self, ids, inputs): # the most import function!!!
for i, inp in zip(ids, inputs):
bp.share.save(i=i, t=bm.get_dt() * i)
self.update(inp)
return self.N.spike.value
#%% brainpy.math.for_loop
n_step_per_monitor = 10
indices1 = np.arange(10000).reshape(-1, n_step_per_monitor)
inputs1 = np.ones_like(indices1) * 20.0
model = EINet()
start_time = time.time()
spks1 = bm.for_loop(model.run, (indices1, inputs1), progress_bar=False)
end_time = time.time()
print('{:.2f} seconds'.format(end_time - start_time))
spks1 = bm.as_numpy(spks1)
plt.figure()
bp.visualize.raster_plot(indices1[:,0], spks1, show=True)
#%% brainpy.math.jit
n_step_per_monitor = 10
indices2 = np.arange(10000)
inputs2 = np.ones_like(indices2) * 20.
model = EINet()
spks2 = []
start_time = time.time()
for i in indices2:
model.jit_step_run(i, inputs2[i])
if i % n_step_per_monitor == 0:
spks2.append(model.N.spike.value) # monitor spikes every time
end_time = time.time()
print('{:.2f} seconds'.format(end_time - start_time))
spks2 = bm.as_numpy(spks2)
plt.figure()
bp.visualize.raster_plot(indices2[::n_step_per_monitor], spks2, show=True)
Outputs:
1.96 seconds
1.01 seconds
Even if I reverse the order of the two methods, the result are almost the same, so the difference is not caused by the JIT compilation time during the first run.
I guess the difference lies in the compilation time of brainpy.math.for_loop. But I will perform more experiments to see what's going on for such a difference.
Actually, brainpy.math.for_loop will be faster if we increase the simulation time steps from 10000 to 100000.