lingvo icon indicating copy to clipboard operation
lingvo copied to clipboard

Regrading Gpipe

Open Raviteja1996 opened this issue 6 years ago • 7 comments

Hi, I want to test how Gpipe works, when i searched in the web I found about lingvo repository. Can i know how to run it. I mean i didn't find any documentation so I was a little confused.

Raviteja1996 avatar Mar 19 '19 03:03 Raviteja1996

@bignamehyp any further comments?

drpngx avatar Mar 19 '19 04:03 drpngx

We will update a better instruction to run GPipe in the near future.

An example to run GPipe is provided at the comments here: https://github.com/tensorflow/lingvo/blob/master/lingvo/tasks/lm/params/one_billion_wds.py#L180.

Once you modified OneBWdsGPipeTransformer hparams, here can start the trainer on 8 GPUs:

bazel-bin/lingvo/trainer --run_locally=gpu --mode=sync --model=lm.one_billion_wds. OneBWdsGPipeTransformer --logdir=/tmp/mnist/log --logtostderr --worker_split_size=8

The general instruction to install/run Lingvo model is provided at https://github.com/tensorflow/lingvo/blob/master/README.md

bignamehyp avatar Mar 19 '19 04:03 bignamehyp

Will there be tutorials for image classification, e.g. AmoebaNet? Thanks

WonderAndMaps avatar Mar 23 '19 09:03 WonderAndMaps

Hi, in the above you mentioned about changing OneBWdsGPipeTransformer hparams and then try to run on 8 GPU's and gave the command to run. I did not understand what are those parameters, can I get help which parameters fit for my system. I am using machine consisting of 4 GPU. What ever parameters I change I am facing segmentation fault core dumped. I am also attaching my system info(GPU).

command : bazel-bin/lingvo/trainer --run_locally=gpu --mode=sync --model=lm.one_billion_wds.OneBWdsGPipeTransformer --logdir=/tmp/mnist/log --logtostderr --worker_split_size=4

segmentation fault.txt

system info: GPU: sys_info.txt

Raviteja1996 avatar Apr 02 '19 05:04 Raviteja1996

Hi, any update about above post.

Raviteja1996 avatar Apr 03 '19 03:04 Raviteja1996

Is it still an open issue?

bignamehyp avatar Jun 01 '19 08:06 bignamehyp

It will be great if more guidance (tutorials) can be offered for running GPipe on the image classification models, such as the AmoebaNet models evaluated in the GPipe arXiv paper and blog :)

Fei

feiwang3311 avatar Jul 09 '19 01:07 feiwang3311