DeepLearningExamples icon indicating copy to clipboard operation
DeepLearningExamples copied to clipboard

[ELECTRA/TensorFlow2] Minor: README Invokes Slurm sbatch With Incorrect Parameter?

Open psharpe99 opened this issue 2 years ago • 0 comments

Related to ELECTRA/TensorFlow2

Describe the bug The README in the MultiNode section says

BATCHSIZE=176 LR=6e-3 GRAD_ACCUM_STEPS=1 PHASE=1 STEPS=10000 WARMUP=2000 b1=0.878 b2=0.974 decay=0.5 skip_adaptive=yes end_lr=0.0 sbatch N48 --ntasks-per-node=8 run.sub BATCHSIZE=24 LR=4e-3 GRAD_ACCUM_STEPS=3 PHASE=2 STEPS=930 WARMUP=200 b1=0.878 b2=0.974 decay=0.5 skip_adaptive=yes end_lr=0.0 sbatch N48 --ntasks-per-node=8 run.sub

I think that this should be "-N48": the slurm sbatch manpage has

sbatch [OPTIONS(0)...] [ : [OPTIONS(N)...]] script(0) [args(0)...] 
     :
-N, --nodes=<minnodes>[-maxnodes]|<size_string>
    Request that a minimum of minnodes nodes be allocated to this job. 

The README command as given would assume that "N48" is actually a script-name, rather than an option.

To Reproduce N/A

Expected behavior N/A

Environment N/A

psharpe99 avatar Jul 07 '23 09:07 psharpe99