DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

ZeroQuant not compressing and making BERT slower

Open K2triinK opened this issue 3 years ago • 1 comments

Describe the bug I was expecting a compressed & faster BERT model after running the BERT ZeroQuant example in DeepSpeedExamples. However, the clean model isn't any smaller (still 417.7 MB) or faster (in fact, it's slower) than the original.

To Reproduce

  1. Go to Google Colab and change to GPU runtime
  2. Run the following: pip install deepspeed==0.7.0 git clone https://github.com/microsoft/DeepSpeedExamples cd DeepSpeedExamples/model_compression/bert (In the zero_quant.sh file, change master_port (e.g. to 9995) and task to sst2 & eval_batch_size to 32 (otherwise you'll get CUDA out of memory)) bash bash_script/ZeroQuant/zero_quant.sh

Expected behavior I expected the final clean model to be a compressed version of the original one, thus being smaller & faster but it isn't.

ds_report output image

System info (please complete the following information):

  • OS: Ubuntu 18.04.6 LTS
  • 1 Tesla T4 GPU
  • Tried with both 3.7.13 and 3.9

K2triinK avatar Aug 19 '22 06:08 K2triinK

Hey @K2triinK

I am wrapping up this PR which answers some part of your questions, such as the model size reduction. Regarding the kernels, we are working on a plan to release it soon so that you can give it a try. Thanks, Reza

RezaYazdaniAminabadi avatar Aug 19 '22 17:08 RezaYazdaniAminabadi