Reproducability
I'm trying to debug why I'm getting different results when using a single GPU vs multiple GPUs (All other parameters are the same).
First, I'm trying to fix the training initialization and randomization so I can compare the two options. However, even with a single GPU the results do not repeat themselves even with seed:
numpy.random.seed(0) mxnet.random.seed(0) random.seed(0)
This should work according to https://github.com/dmlc/mxnet/issues/736
Is there anything I'm missing?
I think the trick should work, make sure you fix the seeds before importing any module in train.py. Let me know if it still fails.
I added the seed calls to the beginning of the train.py but it still doesn't work. Might be some general issue in mxnet as I was unable to make the image-classification example reproduce using this method. I was able to make some toy example reproduce but I'm not sure what exactly is different.
I am having the same issue. Any progress on this?
I think it would be good to have the ability of setting an env variable if one desires reproduceability. Random is imported in so many modules (data.py, trainXXX.py) and the same holds for numpy... it is difficult to keep track and to understand which one gets called first.