[BUG] GPT-NeoX Inference returns nonsense

Open ppetrushkov opened this issue 2 years ago • 1 comments

Describe the bug Running inference with Deepspeed using GPT-NeoX 20B model produces garbage output, indicating an implementation bug.

To Reproduce For example, can be seen when using example script: deepspeed --num_gpus 2 inference_test.py --name EleutherAI/gpt-neox-20b --batch_size 1 --ds_inference --use_kernel --use_meta_tensor --checkpoint_path <downloaded-checkpoint>

Which prints:

in=DeepSpeed is a machine learning framework
out=DeepSpeed is a machine learning framework BytePtrFromString BytePtrFromString BytePtrFromString BytePtrFromStringamssymb errnoErr errnoErr BytePtrFromString BytePtrFromString BytePtrFromStringamsfonts BytePtrFromStringblockList BytePtrFromString errnoErr BytePtrFromString errnoErr BytePtrFromString BytePtrFromString BytePtrFromString BytePtrFromString BytePtrFromString errnoErr errnoErr BytePtrFromString BytePtrFromString BytePtrFromString BytePtrFromString BytePtrFromString errnoErr BytePtrFromString errnoErr errnoErr errnoErr BytePtrFromString BytePtrFromString BytePtrFromString BytePtrFromString errnoErr BytePtrFromString BytePtrFromString errnoErr BytePtrFromString errnoErr BytePtrFromStringblockList BytePtrFromString errnoErr BytePtrFromStringblockList

I tried with deepspeed==0.8.1 and 0.8.3. To load the model checkpoint, one also needs to change split_qkv=False to split_qkv=True here , otherwise the error reported in this issue shows up. From what I could tell, deepspeed could correctly load model weights with this change, but then something goes wrong in the CUDA kernel.

Expected behavior Correct model inference.

ds_report output

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/opt/conda/lib/python3.7/site-packages/torch']
torch version .................... 1.13.1+cu117
deepspeed install path ........... ['/home/ppetrushkov/.local/lib/python3.7/site-packages/deepspeed']
deepspeed info ................... 0.8.3, unknown, unknown
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 11.7
deepspeed wheel compiled w. ...... torch 1.13, cuda 11.7

System info (please complete the following information):

2X V100 GPUs
transformers==4.27.2
Python 3.7.15

Mar 27 '23 16:03 ppetrushkov

+1 to be in loop.

Mar 31 '23 15:03 satpalsr