ffcv
ffcv copied to clipboard
lots of efforts to accelerate training
Key changes:
- use crop decode strategy: see in Towards Pretraining Masked Autoencoders in One Day ! this needs turbo-jpeg 3.0
- save memory allocation, see https://github.com/erow/ffcv/blob/main/ffcv/fields/rgb_image.py#L107
- transformations
Comparison of throughput:
| img_size | 112 | 160 | 192 | 224 | 512 | ||||
|---|---|---|---|---|---|---|---|---|---|
| batch_size | 512 | 512 | 512 | 128 | 256 | 512 | 512 | ||
| num_workers | 10 | 10 | 10 | 10 | 10 | 5 | 10 | 20 | 10 |
| loader | |||||||||
| ours | 23024.0 | 19396.5 | 16503.6 | 16536.1 | 16338.5 | 12369.7 | 14521.4 | 14854.6 | 4260.3 |
| ffcv | 16853.2 | 13906.3 | 13598.4 | 12192.7 | 11960.2 | 9112.7 | 12539.4 | 12601.8 | 3577.8 |
Comparison of memory usage:
| img_size | 112 | 160 | 192 | 224 | 512 | ||||
|---|---|---|---|---|---|---|---|---|---|
| batch_size | 512 | 512 | 512 | 128 | 256 | 512 | 512 | ||
| num_workers | 10 | 10 | 10 | 10 | 10 | 5 | 10 | 20 | 10 |
| loader | |||||||||
| ours | 9.0 | 9.8 | 11.4 | 5.8 | 7.7 | 11.4 | 11.4 | 11.4 | 34.0 |
| ffcv | 13.4 | 14.8 | 17.7 | 7.6 | 11.0 | 17.7 | 17.7 | 17.7 | 56.6 |