albert
albert copied to clipboard
[ALBERT] Pre-training on TPU Pod
Hi all, Could I do pre-training on TPU Pod v2-256 on large/xlarge V2 config (batch 4096, 3M steps,...)? Any config to working on it?
I also wonder that. According to this https://cloud.google.com/tpu/docs/training-on-tpu-pods?hl=ko, keep the per core batch size the same. (batch size * TPU count), (steps / TPU count). Should I set config like this (batch size: 4096 * 32, steps: 3M / 32)?