bindsnet icon indicating copy to clipboard operation
bindsnet copied to clipboard

Has anyone manage to make one of the RL examples to work?

Open ValerioB88 opened this issue 3 years ago • 2 comments

Since there was this bug preventing the RL example (breakout_stdp.py) to work, I am pretty confident that whether that script worked or not wasn't tested (unless the bug was introduced later on, haven't checked).

However, I noticed that solving that bug doesn't help the network learn the task.

I am wondering whether anyone has managed to make the breakout_stdp.py script to work (that is, having the network to solve the task). Maybe it's just about finding the right hyperparameters?

I have also tried another gym task (the cart pole) but haven't been successful with it either. To get the input spikes I convert the observation values into [0, 1] values and pass them into the bernoulli encoding. Still doesn't work. Any idea would be really appreciated. Thanks!

ValerioB88 avatar Sep 07 '22 09:09 ValerioB88

I think you're correct. The only working models that we came up with used MSTDPET (reward-Modulated STDP based on Eligibility Trace), not MSTDP (reward-Modulated STDP) on Atari-like games. Eligibility trace allows the reward to apply on (recent) past actions. It has been used with some success on solving Pong.

But cart pole is a more time-continuous task, and MSTDP should work, to some extent. What is your current model design?

SimonInParis avatar Sep 07 '22 10:09 SimonInParis

Hi @SimonInParis , thank you for answering, I see that the only example in the repo using the MSTDPET is the Dot Tracing. I wonder if, when you say that MSTDPET works on Atari-like game, you mean that I could just change the learning_rule to MSTDPET in breakout_stdp.py and check if it works - or there are more hyperparameters to change. I also think that the example/breakout_stdp.py should be changed in the repo to use the appropriate learning rule/hyperp. that works (happy to do that once I get a working version).

I also thought that CartPole would be fairly easy to solve, but alas it just doesn't learn. The network is as simple as the breakout_stdp net:

inpt = Input(n=n_input, traces=True)
middle = LIFNodes(n=n_hidden, traces=True)
out = LIFNodes(n=n_output, refrac=0, traces=True)

with 4 input, 50 hidden (but I have tried many variation of those), and 2 outputs. I have tried changing wmin and wmax, and even allowing negative weights and biases, with no luck.

I am not bounded to this particular model - or even to any particular learning rule - or even on this specific gym env. I want to develop a research idea with spiking nets, but this hinges on having at least a model that can learn in a RL setup.

I will try MSTDPET and report, but in the meantime any other idea is welcome.

ValerioB88 avatar Sep 07 '22 11:09 ValerioB88