Aaron Harlap
Aaron Harlap
@reedwm Thanks for the suggestion, I will try that and post the results. Just to confirm my understanding of the benchmarking code. In the distributed_replicated (VariableMgrDistributedReplicated) mode, the gradients are...
@reedwm I experimented with using half the learning rate, and the result was even a bit further away: Single machine: 0.47, Two machine same LR: 0.44, Two machine half LR:...
@reedwm Thanks, I will try running both single machine and distributed replicated with those flags, and post the results when they finish. I have tried running with SGD instead of...
We tried running with those config options and we found that: SGD 1 Machine - MB 128 - LR 0.01 = Accuracy @ 1 = 0.2851 Accuracy @ 5 =...
I have encountered a similar problem, has a fix for this ever come up?
@konnase are you running inside docker?
Try running the docker containers with `--net=host` mode.