fairseq-tensorboard icon indicating copy to clipboard operation
fairseq-tensorboard copied to clipboard

Small utility to monitor fairseq training in tensorboard

fairseq-tensorboard

NOTE: the functionality in this library is already present in fairseq since commit 257a3b8 (included in fairseq release 0.6.3).


This is a small utility to monitor fairseq training in tensorboard.

It is not a fork of fairseq, but just a small class that extends its functionality with tensorboard logging.

Installation and Usage

You just need to clone fairseq-tensorboard, install its only direct dependency apart from fairseq itself (tensorboardX) and launch fairseq's train.py specifying as task monitored_translation:

pip install tensorboardX

git clone https://github.com/noe/fairseq-tensorboard.git

python fairseq/train.py \
   --user-dir ./fairseq-tensorboard/fstb \
   --task monitored_translation [...]

Features

  • Logs fairseq training and validation losses.
  • Saves sys.argv and fairseq's args for model traceability (see this).
  • Allows plotting training and validation losses in the same plot (see this).
  • Supports multi-GPU training.

FAQ

Why should I use fstb ?

Because it allows you to visually diagnose your losses!

You would change this:

Obscure terminal logs

...into this:

Intuitive tensorboard plot

How can fairseq load fstb?

You have to provide fairseq with command line argument --user-dir with the path of fstb. This instructs fairseq to load the fstb code, which registers task monitored_translation.

Does fstb work with multi-GPU training?

Yes, it has been tested with single-node multi-GPU training. Only the first worker process logs to tensorboard. The behaviour of the remaining workers is unaltered.