transformers IndexError: index -1 is out of bounds for dimension 1 with size 0

System Info

PC: M2

transformers== 4.31.0.dev0

refer: https://github.com/openai/whisper/discussions/1478

meet the error:

in <module>:9                                                                                    │
│                                                                                                  │
│    6 prompt_ids = processor.get_prompt_ids(prompt)                                               │
│    7                                                                                             │
│    8 forced_decoder_ids = processor.get_decoder_prompt_ids(language="zh", task="transcribe")     │
│ ❱  9 predicted_ids = model.generate(input_features, prompt_ids=prompt_ids, forced_decoder_ids    │
│   10 │   │   │   │   │   │   │      max_new_tokens=3000)                                         │
│   11 transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)             │
│   12 print("耗时:", time.time() - start_time, transcription)                                     │
│                                                                                                  │
│ /Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/transformers/models/whisper/mo │
│ deling_whisper.py:1664 in generate                                                               │
│                                                                                                  │
│   1661 │   │   if generation_config.return_timestamps:                                           │
│   1662 │   │   │   logits_processor = [WhisperTimeStampLogitsProcessor(generation_config)]       │
│   1663 │   │                                                                                     │
│ ❱ 1664 │   │   return super().generate(                                                          │
│   1665 │   │   │   inputs,                                                                       │
│   1666 │   │   │   generation_config,                                                            │
│   1667 │   │   │   logits_processor,                                                             │
│                                                                                                  │
│ /Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/torch/utils/_contextlib.py:115 │
│ in decorate_context                                                                              │
│                                                                                                  │
│   112 │   @functools.wraps(func)                                                                 │
│   113 │   def decorate_context(*args, **kwargs):                                                 │
│   114 │   │   with ctx_factory():                                                                │
│ ❱ 115 │   │   │   return func(*args, **kwargs)                                                   │
│   116 │                                                                                          │
│   117 │   return decorate_context                                                                │
│   118                                                                                            │
│                                                                                                  │
│ /Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/transformers/generation/utils. │
│ py:1522 in generate                                                                              │
│                                                                                                  │
│   1519 │   │   │   │   )                                                                         │
│   1520 │   │   │                                                                                 │
│   1521 │   │   │   # 11. run greedy search                                                       │
│ ❱ 1522 │   │   │   return self.greedy_search(                                                    │
│   1523 │   │   │   │   input_ids,                                                                │
│   1524 │   │   │   │   logits_processor=logits_processor,                                        │
│   1525 │   │   │   │   stopping_criteria=stopping_criteria,                                      │
│                                                                                                  │
│ /Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/transformers/generation/utils. │
│ py:2349 in greedy_search                                                                         │
│                                                                                                  │
│   2346 │   │   │   if synced_gpus and this_peer_finished:                                        │
│   2347 │   │   │   │   continue  # don't waste resources running the code we don't need          │
│   2348 │   │   │                                                                                 │
│ ❱ 2349 │   │   │   next_token_logits = outputs.logits[:, -1, :]                                  │
│   2350 │   │   │                                                                                 │
│   2351 │   │   │   # pre-process distribution                                                    │
│   2352 │   │   │   next_tokens_scores = logits_processor(input_ids, next_token_logits)

use these code all occur error.

from transformers import WhisperForConditionalGeneration, WhisperProcessor
import librosa
import soundfile
import torchaudio

base_model = "/Users/ddd/Documents/github/whisper-large-v2"
processor = WhisperProcessor.from_pretrained(base_model,
                                             language="zh",
                                             task="transcribe",
                                             local_files_only="True")
forced_decoder_ids = processor.get_decoder_prompt_ids(language="zh", task="transcribe")

# 获取模型
model = WhisperForConditionalGeneration.from_pretrained(base_model,
                                                        device_map="auto",
                                                        local_files_only=True).half()
model.eval()

audio_file = "/Users/ddd/Documents/gitlab/llm-train/yuyin/simple.m4a"

src_signal, sample_rate = librosa.load(audio_file, sr=16000)

start = 23196064
end = 23364576

src_signal_demo = src_signal[start:end]
input_features = processor(src_signal_demo, sampling_rate=sample_rate, return_tensors="pt").input_features.half().to("mps")

prompt = '以下是普通话的句子'

prompt_ids = processor.get_prompt_ids(prompt)

forced_decoder_ids = processor.get_decoder_prompt_ids(language="zh", task="transcribe")
predicted_ids = model.generate(input_features, prompt_ids=prompt_ids, forced_decoder_ids=forced_decoder_ids,
                               max_new_tokens=3000)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)

from transformers import pipeline

pipe = pipeline(
    task="automatic-speech-recognition",
    model="openai/whisper-large-v2",
    device="mps",
    chunk_length_s=30, # if not precised then only generate as much as `max_new_tokens`
    generate_kwargs = {"num_beams": 5} # same as setting as "openai whisper" default
)

audio_file = "/Users/ddd/Documents/gitlab/llm-train/yuyin/simple.m4a"

src_signal, sample_rate = librosa.load(audio_file, sr=16000)

start = 23196064
end = 23364576

src_signal_demo = src_signal[start:end]

prompt = '以下是普通话的句子'
prompt_ids = pipe.tokenizer.get_prompt_ids(prompt, return_tensors="pt")
result = pipe(src_signal_demo, generate_kwargs={"language": "zh", "task": "transcribe", "prompt_ids": prompt_ids})

print(result["text"])

Who can help?

No response

Information

[ ] The official example scripts
[x] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[x] My own task or dataset (give details below)

Reproduction

load the audio
slice the audio
add the prompt
transcribe the slice audio, then occur error.

Expected behavior

the audio can transform to the context.

Jun 30 '23 15:06 ryzn0518

cc @gante @sanchit-gandhi

Jun 30 '23 15:06 amyeroberts

Hey @diaojunxian 👋

Your reproducer contains private data, which means we can't easily reproduce on our end -- would you be able to share the audio file with us OR rewrite the reproducer from public data?

At a first glance, because of the thrown exception (IndexError: index -1 is out of bounds for dimension 1 with size 0 in next_token_logits = outputs.logits[:, -1, :]), I'd bet something went wrong at preprocessing time :D bad model input shapes -> bad model output shapes

Jul 03 '23 09:07 gante

Hey @diaojunxian 👋

Your reproducer contains private data, which means we can't easily reproduce on our end -- would you be able to share the audio file with us OR rewrite the reproducer from public data?

At a first glance, because of the thrown exception (IndexError: index -1 is out of bounds for dimension 1 with size 0 in next_token_logits = outputs.logits[:, -1, :]), I'd bet something went wrong at preprocessing time :D bad model input shapes -> bad model output shapes

I can send it to you privately, but it cannot be published on the Internet. Only you can personally verify this bug. Can you see it?

Jul 04 '23 10:07 ryzn0518

@diaojunxian yeah, that would be helpful. You can send it to the email attached to my GH account ([email protected])

You are using an unmodified openai/whisper-large-v2, correct?

Jul 04 '23 11:07 gante

start = 23196064 end = 23364576

yes, unmodified whisper-large-v2, and had send the audio to the gmail.

Jul 05 '23 01:07 ryzn0518

Hey @diaojunxian 👋

In both snippets, the problem is the same: as soon as the model tries to generate beyond its maximum length, the output sequence dimension becomes 0, causing the exception.

I've found the issue and will open a PR to fix it. The second example you provided works perfectly after the fix. The first one probably will fail because of max_new_tokens=3000 (Whisper's maximum length is 448 and we default generation to its maximum length, you probably shouldn't set max_new_tokens at all :) )

Jul 05 '23 10:07 gante

After the PR linked above gets merged, you can install from main and it should work :)

Jul 05 '23 10:07 gante