Wang, Xiao issues

Results 14 issues of


                                            Wang, Xiao

Add CUDA Graph and AOT Autograd support

Add CUDA Graph support with `--cuda-graph` and AOT Autograd support with `--aot-autograd` to **benchmark.py** and **train.py** The workflow for cuda graph in train.py might be a bit overcomplicated. Related: https://github.com/rwightman/pytorch-image-models/issues/1244

[FEATURE] Run benchmark `--model-list` in subprocess

**Is your feature request related to a problem? Please describe.** While doing benchmark on timm-models with `benchmark.py`, I tried the following two ways: 1. `python benchmark.py --model-list _models.txt -b 128`...

enhancement

[WIP] Use sync-free cuda event timing in benchmark

_This PR does not mean the final form of torchbench code changes. I think it's rather a discussion on how we should implement a sync-free cuda event timing mechanism._ This...

cla signed

自动点怪会导致鬼使黑来信

今天用两个号测试了自动点怪，会有被鬼使黑的风险。好像是，当你体力没有用完的时候他会故意弹出一个窗口，说你体力用完了。结果脚本没有停止回来看已经收到鬼使黑来信了 _Originally posted by @Milo-dd in https://github.com/society765/yys-auto-yuhun/issues/7#issuecomment-489972210_

tf_efficientnet_b0_ap model was removed but is still in the doc

**Describe the bug** A clear and concise description of what the bug is. tf_efficientnet_b0_ap model was removed in https://github.com/rwightman/pytorch-image-models/commit/6a01101905e78007e5396f5ffdaae0c4725ba72c#diff-27c2bbd967991cbb5264f93cb5da34895fdab02424b2cc8c63d3d0768e65d47aL1833, but is still in doc https://github.com/rwightman/pytorch-image-models/blob/6a01101905e78007e5396f5ffdaae0c4725ba72c/docs/models/advprop.md#how-do-i-use-this-model-on-an-image **To Reproduce** Steps to reproduce...

bug

Run make_wheel_record parallel in background

By running `make_wheel_record` parallel in background, this saves ~8 minutes on my 12-core intel machine with a full cuda wheel build. It basically makes the loop "instant". https://unix.stackexchange.com/questions/42544/does-redirecting-output-to-a-file-apply-a-lock-on-the-file/42564#42564 This answer...

cla signed

ciflow/binaries

[Feature request] Make this `ADD DEPENDENCIES INTO THE WHEEL` part in manywheel/build_common.sh a standalone script

[Feature request] Make this (amazing) `ADD DEPENDENCIES INTO THE WHEEL` part in _manywheel/build_common.sh_ a standalone script so that it can be reused when a torch wheel is built from other...

Torchdynamo XLNetLMHeadModel AMP+NHWC fails with tensor_inputs_to_check.size() INTERNAL ASSERT FAILED

### 🐛 Describe the bug Reproduce: ```python root@516d815b994f:/workspace/torch-benchmark/torchdynamo# python benchmarks/huggingface.py --training -d cuda --fast --accuracy-aot-ts-mincut --nvfuser --skip-accuracy-check --generate-aot-autograd-stats --isolate --amp --channels-last -k XLNetLMHeadModel WARNING:root:Running smaller batch size=8 for XLNetLMHeadModel, orig...

TIMM tresnet_l AoT_autograd fails with lazy allocation issue

### 🐛 Describe the bug Reproduce: ```python root@c73318efaa9b:/workspace/timm-models/pytorch-image-models# python -u benchmark.py --bench train --model tresnet_l --img-size 224 -b 128 --fuser nvfuser --aot-autograd Benchmarking in float32 precision. NCHW layout. torchscript disabled...

TIMM tresnet_l model fails with Python builtin <built-in method apply ... (InplaceABN.apply?)> is currently not supported in TorchScript

### 🐛 Describe the bug Reproduce: ```python root@c73318efaa9b:/workspace/timm-models/pytorch-image-models# python -u benchmark.py --bench train --model tresnet_l --img-size 224 -b 128 --torchscript --fuser nvfuser Benchmarking in float32 precision. NCHW layout. torchscript enabled...