Lx add profile
为oneflow/python/oneflow/test/modules下的文件 test_abs.py 和test_activate.py 中的算子增加性能测试profile
test_activation.py性能测试效果图

test_abs.py性能测试效果图

更新 gen_ops_process.py
描述:根据重构后的rst文件特性更新接口搜索脚本gen_ops_process.py (注:附带修复部分不规范的rst文档)
生成的md文件预览的表头表尾图下两图所示

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Speed stats:
GPU Name: GeForce GTX 1080
✔️ OneFlow resnet50 time: 128.3ms (= 12826.9ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 141.3ms (= 14133.7ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 141.3ms / 128.3ms)
OneFlow resnet50 time: 75.2ms (= 7520.1ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 82.9ms (= 8289.0ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.10 (= 82.9ms / 75.2ms)
OneFlow resnet50 time: 48.4ms (= 9676.7ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 59.8ms (= 11954.8ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.24 (= 59.8ms / 48.4ms)
OneFlow resnet50 time: 36.1ms (= 7218.1ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 39.8ms (= 7953.1ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.10 (= 39.8ms / 36.1ms)
OneFlow resnet50 time: 28.5ms (= 5700.9ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 35.7ms (= 7142.6ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.25 (= 35.7ms / 28.5ms)
OneFlow swin dataloader time: 0.263s (= 52.668s / 200, num_workers=1)
PyTorch swin dataloader time: 0.150s (= 30.003s / 200, num_workers=1)
Relative speed: 0.570 (= 0.150s / 0.263s)
OneFlow swin dataloader time: 0.070s (= 13.987s / 200, num_workers=4)
PyTorch swin dataloader time: 0.043s (= 8.672s / 200, num_workers=4)
Relative speed: 0.620 (= 0.043s / 0.070s)
OneFlow swin dataloader time: 0.040s (= 8.019s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.378s / 200, num_workers=8)
Relative speed: 0.546 (= 0.022s / 0.040s)
❌ OneFlow resnet50 time: 136.7ms (= 13667.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.0ms (= 16001.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 160.0ms / 136.7ms)
OneFlow resnet50 time: 84.9ms (= 8486.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.8ms (= 10178.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 101.8ms / 84.9ms)
OneFlow resnet50 time: 57.7ms (= 11546.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.7ms (= 15537.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 77.7ms / 57.7ms)
OneFlow resnet50 time: 45.4ms (= 9089.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 80.3ms (= 16067.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.77 (= 80.3ms / 45.4ms)
OneFlow resnet50 time: 39.1ms (= 7810.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.9ms (= 13586.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.74 (= 67.9ms / 39.1ms)
CI failed when running job: cuda-misc. PR label automerge has been removed
Speed stats:
GPU Name: GeForce GTX 1080
✔️ OneFlow resnet50 time: 128.5ms (= 12847.2ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 140.8ms (= 14080.0ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 140.8ms / 128.5ms)
OneFlow resnet50 time: 75.4ms (= 7539.8ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 86.3ms (= 8626.9ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.14 (= 86.3ms / 75.4ms)
OneFlow resnet50 time: 48.3ms (= 9669.1ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 56.0ms (= 11206.9ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.16 (= 56.0ms / 48.3ms)
OneFlow resnet50 time: 35.8ms (= 7159.5ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 45.2ms (= 9049.7ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.26 (= 45.2ms / 35.8ms)
OneFlow resnet50 time: 28.2ms (= 5631.7ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 35.4ms (= 7083.9ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.26 (= 35.4ms / 28.2ms)
OneFlow swin dataloader time: 0.261s (= 52.293s / 200, num_workers=1)
PyTorch swin dataloader time: 0.149s (= 29.872s / 200, num_workers=1)
Relative speed: 0.571 (= 0.149s / 0.261s)
OneFlow swin dataloader time: 0.108s (= 21.589s / 200, num_workers=4)
PyTorch swin dataloader time: 0.042s (= 8.341s / 200, num_workers=4)
Relative speed: 0.386 (= 0.042s / 0.108s)
OneFlow swin dataloader time: 0.062s (= 12.303s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.493s / 200, num_workers=8)
Relative speed: 0.365 (= 0.022s / 0.062s)
❌ OneFlow resnet50 time: 136.8ms (= 13681.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.3ms (= 16131.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 161.3ms / 136.8ms)
OneFlow resnet50 time: 84.7ms (= 8470.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 104.6ms (= 10455.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.23 (= 104.6ms / 84.7ms)
OneFlow resnet50 time: 57.7ms (= 11545.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.5ms (= 15702.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 78.5ms / 57.7ms)
OneFlow resnet50 time: 45.1ms (= 9020.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.0ms (= 14202.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.57 (= 71.0ms / 45.1ms)
OneFlow resnet50 time: 39.0ms (= 7801.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.0ms (= 13401.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.72 (= 67.0ms / 39.0ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8889/
CI failed when running job: cuda-misc. PR label automerge has been removed
Speed stats:
GPU Name: GeForce GTX 1080
✔️ OneFlow resnet50 time: 128.2ms (= 12822.3ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 141.4ms (= 14140.1ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 141.4ms / 128.2ms)
OneFlow resnet50 time: 75.5ms (= 7553.7ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 82.7ms (= 8268.6ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.09 (= 82.7ms / 75.5ms)
OneFlow resnet50 time: 48.3ms (= 9651.9ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 63.7ms (= 12732.9ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.32 (= 63.7ms / 48.3ms)
OneFlow resnet50 time: 35.8ms (= 7165.7ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 43.8ms (= 8769.4ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.22 (= 43.8ms / 35.8ms)
OneFlow resnet50 time: 28.2ms (= 5645.5ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 38.5ms (= 7709.4ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.37 (= 38.5ms / 28.2ms)
OneFlow swin dataloader time: 0.266s (= 53.261s / 200, num_workers=1)
PyTorch swin dataloader time: 0.152s (= 30.436s / 200, num_workers=1)
Relative speed: 0.571 (= 0.152s / 0.266s)
OneFlow swin dataloader time: 0.070s (= 14.062s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.252s / 200, num_workers=4)
Relative speed: 0.587 (= 0.041s / 0.070s)
OneFlow swin dataloader time: 0.040s (= 8.077s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.414s / 200, num_workers=8)
Relative speed: 0.547 (= 0.022s / 0.040s)
❌ OneFlow resnet50 time: 136.8ms (= 13675.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 163.7ms (= 16367.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 163.7ms / 136.8ms)
OneFlow resnet50 time: 84.9ms (= 8490.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.1ms (= 10207.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 102.1ms / 84.9ms)
OneFlow resnet50 time: 57.8ms (= 11554.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 88.6ms (= 17723.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.53 (= 88.6ms / 57.8ms)
OneFlow resnet50 time: 45.5ms (= 9091.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.9ms (= 14185.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.56 (= 70.9ms / 45.5ms)
OneFlow resnet50 time: 38.8ms (= 7768.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.7ms (= 15549.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 2.00 (= 77.7ms / 38.8ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8889/
Speed stats:
GPU Name: GeForce GTX 1080
✔️ OneFlow resnet50 time: 128.5ms (= 12845.1ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 142.2ms (= 14219.2ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.11 (= 142.2ms / 128.5ms)
OneFlow resnet50 time: 75.3ms (= 7528.1ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 84.1ms (= 8407.3ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.12 (= 84.1ms / 75.3ms)
OneFlow resnet50 time: 48.3ms (= 9656.4ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 64.5ms (= 12909.6ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.34 (= 64.5ms / 48.3ms)
OneFlow resnet50 time: 35.9ms (= 7182.1ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 40.7ms (= 8134.4ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.13 (= 40.7ms / 35.9ms)
OneFlow resnet50 time: 28.1ms (= 5622.2ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 36.9ms (= 7371.8ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.31 (= 36.9ms / 28.1ms)
OneFlow swin dataloader time: 0.270s (= 54.000s / 200, num_workers=1)
PyTorch swin dataloader time: 0.155s (= 30.942s / 200, num_workers=1)
Relative speed: 0.573 (= 0.155s / 0.270s)
OneFlow swin dataloader time: 0.113s (= 22.620s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.245s / 200, num_workers=4)
Relative speed: 0.365 (= 0.041s / 0.113s)
OneFlow swin dataloader time: 0.040s (= 8.020s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.438s / 200, num_workers=8)
Relative speed: 0.553 (= 0.022s / 0.040s)
❌ OneFlow resnet50 time: 136.4ms (= 13644.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.9ms (= 16094.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 160.9ms / 136.4ms)
OneFlow resnet50 time: 84.4ms (= 8439.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.6ms (= 10156.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 101.6ms / 84.4ms)
OneFlow resnet50 time: 58.1ms (= 11612.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.9ms (= 15578.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 77.9ms / 58.1ms)
OneFlow resnet50 time: 45.3ms (= 9062.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 80.0ms (= 16009.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.77 (= 80.0ms / 45.3ms)
OneFlow resnet50 time: 39.1ms (= 7813.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.3ms (= 14250.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.82 (= 71.3ms / 39.1ms)
CI failed when running job: cuda-misc. PR label automerge has been removed
Speed stats:
Speed stats:
GPU Name: GeForce GTX 1080
✔️ OneFlow resnet50 time: 128.5ms (= 12851.8ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.6ms (= 14358.8ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.12 (= 143.6ms / 128.5ms)
OneFlow resnet50 time: 75.3ms (= 7531.5ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 87.5ms (= 8747.1ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.16 (= 87.5ms / 75.3ms)
OneFlow resnet50 time: 48.8ms (= 9766.7ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 59.3ms (= 11853.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.21 (= 59.3ms / 48.8ms)
OneFlow resnet50 time: 36.4ms (= 7276.1ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 42.8ms (= 8553.9ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.18 (= 42.8ms / 36.4ms)
OneFlow resnet50 time: 28.2ms (= 5643.3ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 42.4ms (= 8483.9ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.50 (= 42.4ms / 28.2ms)
OneFlow swin dataloader time: 0.255s (= 51.070s / 200, num_workers=1)
PyTorch swin dataloader time: 0.151s (= 30.256s / 200, num_workers=1)
Relative speed: 0.592 (= 0.151s / 0.255s)
OneFlow swin dataloader time: 0.072s (= 14.340s / 200, num_workers=4)
PyTorch swin dataloader time: 0.042s (= 8.367s / 200, num_workers=4)
Relative speed: 0.583 (= 0.042s / 0.072s)
OneFlow swin dataloader time: 0.043s (= 8.580s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.472s / 200, num_workers=8)
Relative speed: 0.521 (= 0.022s / 0.043s)
❌ OneFlow resnet50 time: 136.7ms (= 13665.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.0ms (= 16201.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 162.0ms / 136.7ms)
OneFlow resnet50 time: 85.5ms (= 8550.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.6ms (= 10264.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 102.6ms / 85.5ms)
OneFlow resnet50 time: 58.0ms (= 11599.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.4ms (= 15686.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 78.4ms / 58.0ms)
OneFlow resnet50 time: 45.3ms (= 9050.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 81.2ms (= 16240.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.79 (= 81.2ms / 45.3ms)
OneFlow resnet50 time: 38.8ms (= 7757.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.2ms (= 13635.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.76 (= 68.2ms / 38.8ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8889/
CI failed when running job: cuda-misc. PR label automerge has been removed
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8889/
Speed stats:
GPU Name: GeForce GTX 1080
✔️ OneFlow resnet50 time: 128.5ms (= 12850.8ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.6ms (= 14358.4ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.12 (= 143.6ms / 128.5ms)
OneFlow resnet50 time: 75.3ms (= 7530.0ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.8ms (= 8583.5ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.14 (= 85.8ms / 75.3ms)
OneFlow resnet50 time: 48.9ms (= 9775.7ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 60.2ms (= 12041.0ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.23 (= 60.2ms / 48.9ms)
OneFlow resnet50 time: 36.4ms (= 7274.7ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 43.0ms (= 8598.8ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.18 (= 43.0ms / 36.4ms)
OneFlow resnet50 time: 28.3ms (= 5663.7ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 36.9ms (= 7386.0ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.30 (= 36.9ms / 28.3ms)
OneFlow swin dataloader time: 0.269s (= 53.752s / 200, num_workers=1)
PyTorch swin dataloader time: 0.149s (= 29.899s / 200, num_workers=1)
Relative speed: 0.556 (= 0.149s / 0.269s)
OneFlow swin dataloader time: 0.072s (= 14.492s / 200, num_workers=4)
PyTorch swin dataloader time: 0.040s (= 7.997s / 200, num_workers=4)
Relative speed: 0.552 (= 0.040s / 0.072s)
OneFlow swin dataloader time: 0.040s (= 7.912s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.469s / 200, num_workers=8)
Relative speed: 0.565 (= 0.022s / 0.040s)
❌ OneFlow resnet50 time: 136.7ms (= 13670.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.4ms (= 16143.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 161.4ms / 136.7ms)
OneFlow resnet50 time: 85.4ms (= 8541.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.8ms (= 10281.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 102.8ms / 85.4ms)
OneFlow resnet50 time: 58.8ms (= 11767.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.2ms (= 15635.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.33 (= 78.2ms / 58.8ms)
OneFlow resnet50 time: 45.4ms (= 9078.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.8ms (= 15965.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.76 (= 79.8ms / 45.4ms)
OneFlow resnet50 time: 39.3ms (= 7861.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 65.4ms (= 13071.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.66 (= 65.4ms / 39.3ms)
CI failed when running job: cuda-misc. PR label automerge has been removed
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8889/
Speed stats:
GPU Name: GeForce GTX 1080
✔️ OneFlow resnet50 time: 128.3ms (= 12832.2ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.3ms (= 14327.7ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.12 (= 143.3ms / 128.3ms)
OneFlow resnet50 time: 75.3ms (= 7528.7ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 81.9ms (= 8192.1ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.09 (= 81.9ms / 75.3ms)
OneFlow resnet50 time: 49.0ms (= 9797.6ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 61.1ms (= 12227.8ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.25 (= 61.1ms / 49.0ms)
OneFlow resnet50 time: 36.3ms (= 7254.2ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 46.3ms (= 9258.8ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.28 (= 46.3ms / 36.3ms)
OneFlow resnet50 time: 28.5ms (= 5692.9ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 35.3ms (= 7062.4ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.24 (= 35.3ms / 28.5ms)
OneFlow swin dataloader time: 0.267s (= 53.488s / 200, num_workers=1)
PyTorch swin dataloader time: 0.152s (= 30.342s / 200, num_workers=1)
Relative speed: 0.567 (= 0.152s / 0.267s)
OneFlow swin dataloader time: 0.070s (= 13.936s / 200, num_workers=4)
PyTorch swin dataloader time: 0.042s (= 8.420s / 200, num_workers=4)
Relative speed: 0.604 (= 0.042s / 0.070s)
OneFlow swin dataloader time: 0.040s (= 7.977s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.321s / 200, num_workers=8)
Relative speed: 0.542 (= 0.022s / 0.040s)
❌ OneFlow resnet50 time: 136.7ms (= 13669.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.6ms (= 16260.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 162.6ms / 136.7ms)
OneFlow resnet50 time: 84.9ms (= 8487.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 110.0ms (= 10996.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.30 (= 110.0ms / 84.9ms)
OneFlow resnet50 time: 58.6ms (= 11719.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.9ms (= 15783.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 78.9ms / 58.6ms)
OneFlow resnet50 time: 45.8ms (= 9161.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 76.9ms (= 15389.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.68 (= 76.9ms / 45.8ms)
OneFlow resnet50 time: 38.7ms (= 7747.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.1ms (= 13611.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.76 (= 68.1ms / 38.7ms)
CI failed when running job: cuda-misc. PR label automerge has been removed
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8889/
Speed stats:
GPU Name: GeForce GTX 1080
✔️ OneFlow resnet50 time: 128.5ms (= 12850.8ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 141.6ms (= 14164.0ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 141.6ms / 128.5ms)
OneFlow resnet50 time: 75.5ms (= 7553.4ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 88.1ms (= 8808.6ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.17 (= 88.1ms / 75.5ms)
OneFlow resnet50 time: 49.3ms (= 9863.1ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 55.8ms (= 11167.0ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.13 (= 55.8ms / 49.3ms)
OneFlow resnet50 time: 36.6ms (= 7313.6ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 47.5ms (= 9496.7ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.30 (= 47.5ms / 36.6ms)
OneFlow resnet50 time: 28.6ms (= 5720.3ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 37.7ms (= 7544.8ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.32 (= 37.7ms / 28.6ms)
OneFlow swin dataloader time: 0.272s (= 54.329s / 200, num_workers=1)
PyTorch swin dataloader time: 0.150s (= 30.049s / 200, num_workers=1)
Relative speed: 0.553 (= 0.150s / 0.272s)
OneFlow swin dataloader time: 0.070s (= 13.959s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.264s / 200, num_workers=4)
Relative speed: 0.592 (= 0.041s / 0.070s)
OneFlow swin dataloader time: 0.038s (= 7.646s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.513s / 200, num_workers=8)
Relative speed: 0.590 (= 0.023s / 0.038s)
❌ OneFlow resnet50 time: 136.8ms (= 13681.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 164.1ms (= 16407.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 164.1ms / 136.8ms)
OneFlow resnet50 time: 85.5ms (= 8551.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.9ms (= 11286.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 112.9ms / 85.5ms)
OneFlow resnet50 time: 58.7ms (= 11747.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 89.0ms (= 17790.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.51 (= 89.0ms / 58.7ms)
OneFlow resnet50 time: 45.8ms (= 9155.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.2ms (= 14030.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.53 (= 70.2ms / 45.8ms)
OneFlow resnet50 time: 39.3ms (= 7852.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 75.6ms (= 15110.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.92 (= 75.6ms / 39.3ms)