transformers-benchmarks
transformers-benchmarks copied to clipboard
real Transformer TeraFLOPS on various GPUs
…16-bit weight, 'should be 'with 16-bit'
我在 RTX 4090 工作站上运行 banchmark 程序,取得了异常高的总线带宽数据: ``` bash Pytorch version : 1.14.0a0+44dac51 CUDA version : 12.0 GPU : NVIDIA GeForce RTX 4090 Matrix Multiplication: n=128 n=512 n=2048 n=8192 torch.float32 1.048...
  I try to run your "transformers.ipynb" with python. But it had an error. How can I solve it?
请问沐神: 1. 在notebook中指向的wiki里,3090ti的理论值40是从表中的core boosted value([39.997](https://en.wikipedia.org/wiki/GeForce_30_series#:~:text=33.546-,(39.997),-0.524%0A(0.625)))得到的吗? 2. 我在自己的很多块3090上,用CUDA11.7和nvidia-driver 525跑出来的TFLOPS都只有24,距离base(29.3)和boost(35.6)的理论值都有一定的差距。请问notebook中用3090ti跑的TFLOPS是经过超频的吗?要想达到接近理论值的FLOPS需要做怎样的设置呢?
the transformers library has changed its internal structure, and BertLayer is no longer accessible through the old import paths.
Hi thank you for the great benchmark as well as the videos explaining papers! Currently the table/explanations says that actual matrix multiplication value is much smaller than theoretical one. But...