Has anyone manage to make one of the RL examples to work?
Since there was this bug preventing the RL example (breakout_stdp.py) to work, I am pretty confident that whether that script worked or not wasn't tested (unless the bug was introduced later on, haven't checked).
However, I noticed that solving that bug doesn't help the network learn the task.
I am wondering whether anyone has managed to make the breakout_stdp.py script to work (that is, having the network to solve the task). Maybe it's just about finding the right hyperparameters?
I have also tried another gym task (the cart pole) but haven't been successful with it either. To get the input spikes I convert the observation values into [0, 1] values and pass them into the bernoulli encoding. Still doesn't work. Any idea would be really appreciated. Thanks!
I think you're correct. The only working models that we came up with used MSTDPET (reward-Modulated STDP based on Eligibility Trace), not MSTDP (reward-Modulated STDP) on Atari-like games. Eligibility trace allows the reward to apply on (recent) past actions. It has been used with some success on solving Pong.
But cart pole is a more time-continuous task, and MSTDP should work, to some extent. What is your current model design?
Hi @SimonInParis , thank you for answering,
I see that the only example in the repo using the MSTDPET is the Dot Tracing. I wonder if, when you say that MSTDPET works on Atari-like game, you mean that I could just change the learning_rule to MSTDPET in breakout_stdp.py and check if it works - or there are more hyperparameters to change. I also think that the example/breakout_stdp.py should be changed in the repo to use the appropriate learning rule/hyperp. that works (happy to do that once I get a working version).
I also thought that CartPole would be fairly easy to solve, but alas it just doesn't learn.
The network is as simple as the breakout_stdp net:
inpt = Input(n=n_input, traces=True)
middle = LIFNodes(n=n_hidden, traces=True)
out = LIFNodes(n=n_output, refrac=0, traces=True)
with 4 input, 50 hidden (but I have tried many variation of those), and 2 outputs. I have tried changing wmin and wmax, and even allowing negative weights and biases, with no luck.
I am not bounded to this particular model - or even to any particular learning rule - or even on this specific gym env. I want to develop a research idea with spiking nets, but this hinges on having at least a model that can learn in a RL setup.
I will try MSTDPET and report, but in the meantime any other idea is welcome.