Performance
Hi, thanks for your work! But I wonder whether this code can reproduce the performance mentioned in the <Neural Episodic Control> paper. Have you tested it?
Hi, I have tested the algorithm on a few games from the Atari dataset. While it seems to work okay, and learns pretty quickly, unfortunately it doesn't quite match the results given in the paper.
This could be for a number of reasons:
- They don't mention in the paper, but when identical states are encountered they find that entry in the dict (via a separate hash) and replace it, rather than adding it to the dictionary again. I didn't implement this because it would require defining a hash function for each separate dataset.
- Differences in approximate k-nearest-neighbour algorithm used.
- Some minor implementational differences (these can make a big difference)
- Some experimental differences which I don't know about.
- Possibility that my implementation is wrong.
All in all, chasing down their reported results seems a bit futile given all the possible differences, especially as a small difference in performance isn't really important for Reinforcement Learning at this stage.
That said, let me know if you're having significant performance issues (say if you're not clearly outperforming a random baseline after ~250000 steps).