Regrading Gpipe
Hi, I want to test how Gpipe works, when i searched in the web I found about lingvo repository. Can i know how to run it. I mean i didn't find any documentation so I was a little confused.
@bignamehyp any further comments?
We will update a better instruction to run GPipe in the near future.
An example to run GPipe is provided at the comments here: https://github.com/tensorflow/lingvo/blob/master/lingvo/tasks/lm/params/one_billion_wds.py#L180.
Once you modified OneBWdsGPipeTransformer hparams, here can start the trainer on 8 GPUs:
bazel-bin/lingvo/trainer --run_locally=gpu --mode=sync --model=lm.one_billion_wds. OneBWdsGPipeTransformer --logdir=/tmp/mnist/log --logtostderr --worker_split_size=8
The general instruction to install/run Lingvo model is provided at https://github.com/tensorflow/lingvo/blob/master/README.md
Will there be tutorials for image classification, e.g. AmoebaNet? Thanks
Hi, in the above you mentioned about changing OneBWdsGPipeTransformer hparams and then try to run on 8 GPU's and gave the command to run. I did not understand what are those parameters, can I get help which parameters fit for my system. I am using machine consisting of 4 GPU. What ever parameters I change I am facing segmentation fault core dumped. I am also attaching my system info(GPU).
command : bazel-bin/lingvo/trainer --run_locally=gpu --mode=sync --model=lm.one_billion_wds.OneBWdsGPipeTransformer --logdir=/tmp/mnist/log --logtostderr --worker_split_size=4
system info: GPU: sys_info.txt
Hi, any update about above post.
Is it still an open issue?
It will be great if more guidance (tutorials) can be offered for running GPipe on the image classification models, such as the AmoebaNet models evaluated in the GPipe arXiv paper and blog :)
Fei