transformers icon indicating copy to clipboard operation
transformers copied to clipboard

IndexError: index -1 is out of bounds for dimension 1 with size 0

Open ryzn0518 opened this issue 2 years ago โ€ข 1 comments

System Info

PC: M2

transformers== 4.31.0.dev0

refer: https://github.com/openai/whisper/discussions/1478

meet the error:

in <module>:9                                                                                    โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    6 prompt_ids = processor.get_prompt_ids(prompt)                                               โ”‚
โ”‚    7                                                                                             โ”‚
โ”‚    8 forced_decoder_ids = processor.get_decoder_prompt_ids(language="zh", task="transcribe")     โ”‚
โ”‚ โฑ  9 predicted_ids = model.generate(input_features, prompt_ids=prompt_ids, forced_decoder_ids    โ”‚
โ”‚   10 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚      max_new_tokens=3000)                                         โ”‚
โ”‚   11 transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)             โ”‚
โ”‚   12 print("่€—ๆ—ถ:", time.time() - start_time, transcription)                                     โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/transformers/models/whisper/mo โ”‚
โ”‚ deling_whisper.py:1664 in generate                                                               โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   1661 โ”‚   โ”‚   if generation_config.return_timestamps:                                           โ”‚
โ”‚   1662 โ”‚   โ”‚   โ”‚   logits_processor = [WhisperTimeStampLogitsProcessor(generation_config)]       โ”‚
โ”‚   1663 โ”‚   โ”‚                                                                                     โ”‚
โ”‚ โฑ 1664 โ”‚   โ”‚   return super().generate(                                                          โ”‚
โ”‚   1665 โ”‚   โ”‚   โ”‚   inputs,                                                                       โ”‚
โ”‚   1666 โ”‚   โ”‚   โ”‚   generation_config,                                                            โ”‚
โ”‚   1667 โ”‚   โ”‚   โ”‚   logits_processor,                                                             โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/torch/utils/_contextlib.py:115 โ”‚
โ”‚ in decorate_context                                                                              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   112 โ”‚   @functools.wraps(func)                                                                 โ”‚
โ”‚   113 โ”‚   def decorate_context(*args, **kwargs):                                                 โ”‚
โ”‚   114 โ”‚   โ”‚   with ctx_factory():                                                                โ”‚
โ”‚ โฑ 115 โ”‚   โ”‚   โ”‚   return func(*args, **kwargs)                                                   โ”‚
โ”‚   116 โ”‚                                                                                          โ”‚
โ”‚   117 โ”‚   return decorate_context                                                                โ”‚
โ”‚   118                                                                                            โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/transformers/generation/utils. โ”‚
โ”‚ py:1522 in generate                                                                              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   1519 โ”‚   โ”‚   โ”‚   โ”‚   )                                                                         โ”‚
โ”‚   1520 โ”‚   โ”‚   โ”‚                                                                                 โ”‚
โ”‚   1521 โ”‚   โ”‚   โ”‚   # 11. run greedy search                                                       โ”‚
โ”‚ โฑ 1522 โ”‚   โ”‚   โ”‚   return self.greedy_search(                                                    โ”‚
โ”‚   1523 โ”‚   โ”‚   โ”‚   โ”‚   input_ids,                                                                โ”‚
โ”‚   1524 โ”‚   โ”‚   โ”‚   โ”‚   logits_processor=logits_processor,                                        โ”‚
โ”‚   1525 โ”‚   โ”‚   โ”‚   โ”‚   stopping_criteria=stopping_criteria,                                      โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/transformers/generation/utils. โ”‚
โ”‚ py:2349 in greedy_search                                                                         โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   2346 โ”‚   โ”‚   โ”‚   if synced_gpus and this_peer_finished:                                        โ”‚
โ”‚   2347 โ”‚   โ”‚   โ”‚   โ”‚   continue  # don't waste resources running the code we don't need          โ”‚
โ”‚   2348 โ”‚   โ”‚   โ”‚                                                                                 โ”‚
โ”‚ โฑ 2349 โ”‚   โ”‚   โ”‚   next_token_logits = outputs.logits[:, -1, :]                                  โ”‚
โ”‚   2350 โ”‚   โ”‚   โ”‚                                                                                 โ”‚
โ”‚   2351 โ”‚   โ”‚   โ”‚   # pre-process distribution                                                    โ”‚
โ”‚   2352 โ”‚   โ”‚   โ”‚   next_tokens_scores = logits_processor(input_ids, next_token_logits)

use these code all occur error.

from transformers import WhisperForConditionalGeneration, WhisperProcessor
import librosa
import soundfile
import torchaudio

base_model = "/Users/ddd/Documents/github/whisper-large-v2"
processor = WhisperProcessor.from_pretrained(base_model,
                                             language="zh",
                                             task="transcribe",
                                             local_files_only="True")
forced_decoder_ids = processor.get_decoder_prompt_ids(language="zh", task="transcribe")

# ่Žทๅ–ๆจกๅž‹
model = WhisperForConditionalGeneration.from_pretrained(base_model,
                                                        device_map="auto",
                                                        local_files_only=True).half()
model.eval()

audio_file = "/Users/ddd/Documents/gitlab/llm-train/yuyin/simple.m4a"

src_signal, sample_rate = librosa.load(audio_file, sr=16000)

start = 23196064
end = 23364576

src_signal_demo = src_signal[start:end]
input_features = processor(src_signal_demo, sampling_rate=sample_rate, return_tensors="pt").input_features.half().to("mps")

prompt = 'ไปฅไธ‹ๆ˜ฏๆ™ฎ้€š่ฏ็š„ๅฅๅญ'

prompt_ids = processor.get_prompt_ids(prompt)

forced_decoder_ids = processor.get_decoder_prompt_ids(language="zh", task="transcribe")
predicted_ids = model.generate(input_features, prompt_ids=prompt_ids, forced_decoder_ids=forced_decoder_ids,
                               max_new_tokens=3000)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
from transformers import pipeline

pipe = pipeline(
    task="automatic-speech-recognition",
    model="openai/whisper-large-v2",
    device="mps",
    chunk_length_s=30, # if not precised then only generate as much as `max_new_tokens`
    generate_kwargs = {"num_beams": 5} # same as setting as "openai whisper" default
)

audio_file = "/Users/ddd/Documents/gitlab/llm-train/yuyin/simple.m4a"

src_signal, sample_rate = librosa.load(audio_file, sr=16000)

start = 23196064
end = 23364576

src_signal_demo = src_signal[start:end]

prompt = 'ไปฅไธ‹ๆ˜ฏๆ™ฎ้€š่ฏ็š„ๅฅๅญ'
prompt_ids = pipe.tokenizer.get_prompt_ids(prompt, return_tensors="pt")
result = pipe(src_signal_demo, generate_kwargs={"language": "zh", "task": "transcribe", "prompt_ids": prompt_ids})

print(result["text"])

Who can help?

No response

Information

  • [ ] The official example scripts
  • [x] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [x] My own task or dataset (give details below)

Reproduction

  1. load the audio
  2. slice the audio
  3. add the prompt
  4. transcribe the slice audio, then occur error.

Expected behavior

the audio can transform to the context.

ryzn0518 avatar Jun 30 '23 15:06 ryzn0518

cc @gante @sanchit-gandhi

amyeroberts avatar Jun 30 '23 15:06 amyeroberts

Hey @diaojunxian ๐Ÿ‘‹

Your reproducer contains private data, which means we can't easily reproduce on our end -- would you be able to share the audio file with us OR rewrite the reproducer from public data?

At a first glance, because of the thrown exception (IndexError: index -1 is out of bounds for dimension 1 with size 0 in next_token_logits = outputs.logits[:, -1, :]), I'd bet something went wrong at preprocessing time :D bad model input shapes -> bad model output shapes

gante avatar Jul 03 '23 09:07 gante

Hey @diaojunxian ๐Ÿ‘‹

Your reproducer contains private data, which means we can't easily reproduce on our end -- would you be able to share the audio file with us OR rewrite the reproducer from public data?

At a first glance, because of the thrown exception (IndexError: index -1 is out of bounds for dimension 1 with size 0 in next_token_logits = outputs.logits[:, -1, :]), I'd bet something went wrong at preprocessing time :D bad model input shapes -> bad model output shapes

I can send it to you privately, but it cannot be published on the Internet. Only you can personally verify this bug. Can you see it?

ryzn0518 avatar Jul 04 '23 10:07 ryzn0518

@diaojunxian yeah, that would be helpful. You can send it to the email attached to my GH account ([email protected])

You are using an unmodified openai/whisper-large-v2, correct?

gante avatar Jul 04 '23 11:07 gante

start = 23196064 end = 23364576

yes, unmodified whisper-large-v2, and had send the audio to the gmail.

ryzn0518 avatar Jul 05 '23 01:07 ryzn0518

Hey @diaojunxian ๐Ÿ‘‹

In both snippets, the problem is the same: as soon as the model tries to generate beyond its maximum length, the output sequence dimension becomes 0, causing the exception.

I've found the issue and will open a PR to fix it. The second example you provided works perfectly after the fix. The first one probably will fail because of max_new_tokens=3000 (Whisper's maximum length is 448 and we default generation to its maximum length, you probably shouldn't set max_new_tokens at all :) )

gante avatar Jul 05 '23 10:07 gante

After the PR linked above gets merged, you can install from main and it should work :)

gante avatar Jul 05 '23 10:07 gante