transformers openai/whisper-large-v2 prompt

System Info

transformers version: 4.38.2
Platform: Windows-10-10.0.22621-SP0
Python version: 3.9.18
Huggingface_hub version: 0.20.3
Safetensors version: 0.4.2
Accelerate version: 0.27.2
Accelerate config: not found
PyTorch version (GPU?): 2.2.0 (True)

Who can help?

@sanchit-gandhi

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

def get_transcription_local_model(file_path):

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "openai/whisper-large-v2"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)
result = pipe(file_path,generate_kwargs={"language": "en","task": "translate"})
return result['text']

Expected behavior

i used the following code to translate from Arabic audio to English Text , and i need to add some custom words to be tokenized ,also i used the openai api prompt (initial_prompt) and it works well , but the accuracy of the following code is better than api . i need to know in which part of the following code i can add the prompt.

Mar 13 '24 09:03 kirollosHossam

cc @ylacombe too

Apr 15 '24 10:04 amyeroberts

Hey @kirollosHossam, I not sure to follow what's your request here? Could you detail it a bit? Many thanks!

May 13 '24 16:05 ylacombe

Will be resolved by #28556, until which time you can manually compute your prompt ids, and pass them to the pipeline as follows:

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset


device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "distil-whisper/distil-large-v2"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    torch_dtype=torch_dtype,
    device=device,
)

dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = dataset[3]["audio"]

# --- Without prompt ---
result = pipe(sample.copy())
print("Without prompt:", result["text"])
# He has grave doubts whether Sir Frederick Leighton's work is really Greek after all, and can discover in it but little of Rocky Ithaca.

# --- With prompt ---
# Let's change the spelling of "Leighton" -> "Layton" by passing it as a prompt
prompt = "Layton"
prompt_ids = processor.get_prompt_ids(prompt, return_tensors="pt").to(device)
result = pipe(sample, generate_kwargs={"prompt_ids": prompt_ids})
print("With prompt:", result["text"][len(prompt) + 1:])
# He has grave doubts whether Sir Frederick Layton's work is really Greek after all, and can discover in it but little of Rocky Ithaca.

May 20 '24 16:05 sanchit-gandhi

There's a bug here that the prompt is included in the generation, will be fixed by https://github.com/huggingface/transformers/pull/27836!

May 22 '24 13:05 sanchit-gandhi

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jun 16 '24 08:06 github-actions[bot]