neat-python icon indicating copy to clipboard operation
neat-python copied to clipboard

The distributed example doesn't work

Open nexon33 opened this issue 3 years ago • 5 comments

The example code at https://github.com/CodeReclaimers/neat-python/blob/master/examples/xor/evolve-feedforward-distributed.py doesn't seem to work and I can't get it to work.

lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object '_ExtendedManager._get_manager_class.<locals>._EvaluatorSyncManager'

It would be really cool to run this on multiple devices and have it train a lot quicker

nexon33 avatar Sep 15 '22 12:09 nexon33

I did get it to work but indeed its a bit unreliable like the docs say.

Would je really excited to see that being picked up however. :)

nexon33 avatar Sep 15 '22 17:09 nexon33

Hi,

I am the guy who wrote neat.distributed a couple of years ago. The old version tried to utilize multiprocessing's distributed functionality, but (as you can see) it has some issues. I rewrote the entire module ~5 years ago (see #125). The changes weren't merged and by now seem to cause some merge conflicts, but it should solve most problems. In case you want to check it out: my branch https://github.com/bennr01/neat-python/tree/distributed_socket contains the entire changes, but it's also somewhat outdated compared to the main repository.

Regarding performance: how much neat.distributed can improve the performance varies greatly depending on your exact network evaluation function. For most "small" examples (e.g. xor, ...), the performance cost of serializing and deserializing exceeds the cost of simply evaluating on a single device. If you are looking to improve evaluation speed for such examples, I'd recommend either using neat.parallel or PyPy instead (note: pypy+neat.parallel is slower).

bennr01 avatar Sep 15 '22 18:09 bennr01

Hi,

I am the guy who wrote neat.distributed a couple of years ago. The old version tried to utilize multiprocessing's distributed functionality, but (as you can see) it has some issues. I rewrote the entire module ~5 years ago (see #125). The changes weren't merged and by now seem to cause some merge conflicts, but it should solve most problems. In case you want to check it out: my branch https://github.com/bennr01/neat-python/tree/distributed_socket contains the entire changes, but it's also somewhat outdated compared to the main repository.

Regarding performance: how much neat.distributed can improve the performance varies greatly depending on your exact network evaluation function. For most "small" examples (e.g. xor, ...), the performance cost of serializing and deserializing exceeds the cost of simply evaluating on a single device. If you are looking to improve evaluation speed for such examples, I'd recommend either using neat.parallel or PyPy instead (note: pypy+neat.parallel is slower).

In fact I'm trying to optimize a function that takes a few minutes to complete, so running this distributed would really speed this up a lot so I don't need to wait half an hour for one generation. Is Pypy in combination with distributed computing advised?

I will try and take a look at the code tomorrow.

nexon33 avatar Sep 15 '22 18:09 nexon33

I'm having trouble merging the repositories as I almost never have done it before. The problem is mainly how can I merge this so I can start selecting which code should stay and which shouldn't

Is there any other way I can contact you?

nexon33 avatar Sep 16 '22 07:09 nexon33

In fact I'm trying to optimize a function that takes a few minutes to complete, so running this distributed would really speed this up a lot so I don't need to wait half an hour for one generation. Is Pypy in combination with distributed computing advised?

I haven't tested it. In theory it should work as long as you set num_workers=1 on each secondary node and manually start a pypy process for each core on each secondary node. This is because IIRC pypy looses a lot of performance benefits when using multiprocessing.Pool, although this may depend on the exact use case and may have changed in the last couple of years. Running a seperate pypy process for each core may allow you to circumvent this.

I'm having trouble merging the repositories as I almost never have done it before. The problem is mainly how can I merge this so I can start selecting which code should stay and which shouldn't

For anyone else reading this: I've responded to a separate issue in my fork here.

bennr01 avatar Sep 16 '22 14:09 bennr01