Aaron Harlap comments

Results 7 comments of


                                            Aaron Harlap

Different accuracy in distributed vs single machine convergence

@reedwm Thanks for the suggestion, I will try that and post the results. Just to confirm my understanding of the benchmarking code. In the distributed_replicated (VariableMgrDistributedReplicated) mode, the gradients are...

Different accuracy in distributed vs single machine convergence

@reedwm I experimented with using half the learning rate, and the result was even a bit further away: Single machine: 0.47, Two machine same LR: 0.44, Two machine half LR:...

Different accuracy in distributed vs single machine convergence

@reedwm Thanks, I will try running both single machine and distributed replicated with those flags, and post the results when they finish. I have tried running with SGD instead of...

Different accuracy in distributed vs single machine convergence

We tried running with those config options and we found that: SGD 1 Machine - MB 128 - LR 0.01 = Accuracy @ 1 = 0.2851 Accuracy @ 5 =...

It's necessary to endure that the network bandwidth is used up

I have encountered a similar problem, has a fix for this ever come up?

It's necessary to endure that the network bandwidth is used up

@konnase are you running inside docker?

It's necessary to endure that the network bandwidth is used up

Try running the docker containers with `--net=host` mode.