transformers-benchmarks issues

on line 1428 of the code ,there 'with 32-bit, we use 2 bytes for the …

…16-bit weight, 'should be 'with 16-bit'

Exceptionally high memory bandwidth

2

我在 RTX 4090 工作站上运行 banchmark 程序，取得了异常高的总线带宽数据： ``` bash Pytorch version : 1.14.0a0+44dac51 CUDA version : 12.0 GPU : NVIDIA GeForce RTX 4090 Matrix Multiplication: n=128 n=512 n=2048 n=8192 torch.float32 1.048...

MingruiZhuang

AttributeError: 'NoneType' object has no attribute 'samples_per_sec'

2

![image](https://user-images.githubusercontent.com/45490378/222379574-1e02c7bc-d86b-4038-ae31-5f5537c50999.png) ![image](https://user-images.githubusercontent.com/45490378/222380886-0040caf7-6502-4b04-a2b2-95a38e0e2ef3.png) I try to run your "transformers.ipynb" with python. But it had an error. How can I solve it?

michelleqyhqyh

About the theoretical value of the GPU

2

请问沐神： 1. 在notebook中指向的wiki里，3090ti的理论值40是从表中的core boosted value（[39.997](https://en.wikipedia.org/wiki/GeForce_30_series#:~:text=33.546-,(39.997),-0.524%0A(0.625))）得到的吗？ 2. 我在自己的很多块3090上，用CUDA11.7和nvidia-driver 525跑出来的TFLOPS都只有24，距离base（29.3）和boost（35.6）的理论值都有一定的差距。请问notebook中用3090ti跑的TFLOPS是经过超频的吗？要想达到接近理论值的FLOPS需要做怎样的设置呢？

fingertap

fix BertLayer import error

the transformers library has changed its internal structure, and BertLayer is no longer accessible through the old import paths.

Saibo-creator

Actual value seems to be the same as theoretical value (instead of much smaller than it)

Hi thank you for the great benchmark as well as the videos explaining papers! Currently the table/explanations says that actual matrix multiplication value is much smaller than theoretical one. But...

fzyzcjy

transformers-benchmarks
transformers-benchmarks copied to clipboard

Metadata

on line 1428 of the code ,there 'with 32-bit, we use 2 bytes for the …

Exceptionally high memory bandwidth

AttributeError: 'NoneType' object has no attribute 'samples_per_sec'

About the theoretical value of the GPU

fix BertLayer import error

Actual value seems to be the same as theoretical value (instead of much smaller than it)

← Metadata

Owner

Metadata

transformers-benchmarks transformers-benchmarks copied to clipboard

Metadata

on line 1428 of the code ,there 'with 32-bit, we use 2 bytes for the …

Exceptionally high memory bandwidth

AttributeError: 'NoneType' object has no attribute 'samples_per_sec'

About the theoretical value of the GPU

fix BertLayer import error

Actual value seems to be the same as theoretical value (instead of much smaller than it)

← Metadata

Owner

Metadata

transformers-benchmarks
transformers-benchmarks copied to clipboard