Donggeun Yu
Donggeun Yu
@schegde did you solve it? There is no problem when padding=0 is used. Different results from PyTorch only when padding=1 or higher.
What's the difference?
@sakjain92 FGPU evaluation.sh worked successfully, but I couldn't confirm the GPU split with nvidia-smi. As in TODO.md, GPU virtualization seems to be able to split the GPU into containers. But...
Let's analyze it. Thank you~
@zhangmozhe I'm not sure how to utilize identity matrices. I tried using torch.Identity and torch.Conv2d, but ONNX ignores those layers. As before, the channel size is *.
@qubvel [profile_torch.bfloat16.txt](https://github.com/huggingface/transformers/files/15131446/profile_torch.bfloat16.txt) [profile_torch.float32.txt](https://github.com/huggingface/transformers/files/15131447/profile_torch.float32.txt)
> Based on profiling results it seems like conv2d is slow with `bfloat16` > > I found the following information regarding this issue: > > Comment 1 [pytorch/pytorch#57707 (comment)](https://github.com/pytorch/pytorch/issues/57707#issuecomment-1166656767) >...
No improvement... The results below are using `TORCH_CUDNN_V8_API_ENABLED=1`. [profile_torch.bfloat16.txt](https://github.com/huggingface/transformers/files/15144695/profile_torch.bfloat16.txt)
I'm so sorry. Here [profile_torch.bfloat16.txt](https://github.com/huggingface/transformers/files/15150111/profile_torch.bfloat16.txt) The method below was used. `TORCH_CUDNN_V8_API_ENABLED=1 python run_profiling.py`
@qubvel I don't think the problem is caused by a lower version. used nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04 CUDA: 12.1 cudnn: 8.9 torch=2.1