Donggeun Yu

Results 22 comments of Donggeun Yu

@schegde did you solve it? There is no problem when padding=0 is used. Different results from PyTorch only when padding=1 or higher.

@sakjain92 FGPU evaluation.sh worked successfully, but I couldn't confirm the GPU split with nvidia-smi. As in TODO.md, GPU virtualization seems to be able to split the GPU into containers. But...

Let's analyze it. Thank you~

@zhangmozhe I'm not sure how to utilize identity matrices. I tried using torch.Identity and torch.Conv2d, but ONNX ignores those layers. As before, the channel size is *.

@qubvel [profile_torch.bfloat16.txt](https://github.com/huggingface/transformers/files/15131446/profile_torch.bfloat16.txt) [profile_torch.float32.txt](https://github.com/huggingface/transformers/files/15131447/profile_torch.float32.txt)

> Based on profiling results it seems like conv2d is slow with `bfloat16` > > I found the following information regarding this issue: > > Comment 1 [pytorch/pytorch#57707 (comment)](https://github.com/pytorch/pytorch/issues/57707#issuecomment-1166656767) >...

No improvement... The results below are using `TORCH_CUDNN_V8_API_ENABLED=1`. [profile_torch.bfloat16.txt](https://github.com/huggingface/transformers/files/15144695/profile_torch.bfloat16.txt)

I'm so sorry. Here [profile_torch.bfloat16.txt](https://github.com/huggingface/transformers/files/15150111/profile_torch.bfloat16.txt) The method below was used. `TORCH_CUDNN_V8_API_ENABLED=1 python run_profiling.py`

@qubvel I don't think the problem is caused by a lower version. used nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04 CUDA: 12.1 cudnn: 8.9 torch=2.1