mxnet-ssd icon indicating copy to clipboard operation
mxnet-ssd copied to clipboard

Reproducability

Open assafmus opened this issue 9 years ago • 4 comments

I'm trying to debug why I'm getting different results when using a single GPU vs multiple GPUs (All other parameters are the same).

First, I'm trying to fix the training initialization and randomization so I can compare the two options. However, even with a single GPU the results do not repeat themselves even with seed:

numpy.random.seed(0) mxnet.random.seed(0) random.seed(0)

This should work according to https://github.com/dmlc/mxnet/issues/736

Is there anything I'm missing?

assafmus avatar Jan 05 '17 16:01 assafmus

I think the trick should work, make sure you fix the seeds before importing any module in train.py. Let me know if it still fails.

zhreshold avatar Jan 09 '17 08:01 zhreshold

I added the seed calls to the beginning of the train.py but it still doesn't work. Might be some general issue in mxnet as I was unable to make the image-classification example reproduce using this method. I was able to make some toy example reproduce but I'm not sure what exactly is different.

assafmus avatar Jan 09 '17 08:01 assafmus

I am having the same issue. Any progress on this?

jmerkow avatar May 28 '17 16:05 jmerkow

I think it would be good to have the ability of setting an env variable if one desires reproduceability. Random is imported in so many modules (data.py, trainXXX.py) and the same holds for numpy... it is difficult to keep track and to understand which one gets called first.

al-rigazzi avatar Sep 05 '17 13:09 al-rigazzi