openai/whisper-large-v2 prompt
System Info
-
transformersversion: 4.38.2 - Platform: Windows-10-10.0.22621-SP0
- Python version: 3.9.18
- Huggingface_hub version: 0.20.3
- Safetensors version: 0.4.2
- Accelerate version: 0.27.2
- Accelerate config: not found
- PyTorch version (GPU?): 2.2.0 (True)
Who can help?
@sanchit-gandhi
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
def get_transcription_local_model(file_path):
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "openai/whisper-large-v2"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
max_new_tokens=128,
chunk_length_s=30,
batch_size=16,
return_timestamps=True,
torch_dtype=torch_dtype,
device=device,
)
result = pipe(file_path,generate_kwargs={"language": "en","task": "translate"})
return result['text']
Expected behavior
i used the following code to translate from Arabic audio to English Text , and i need to add some custom words to be tokenized ,also i used the openai api prompt (initial_prompt) and it works well , but the accuracy of the following code is better than api . i need to know in which part of the following code i can add the prompt.
cc @ylacombe too
Hey @kirollosHossam, I not sure to follow what's your request here? Could you detail it a bit? Many thanks!
Will be resolved by #28556, until which time you can manually compute your prompt ids, and pass them to the pipeline as follows:
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "distil-whisper/distil-large-v2"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
max_new_tokens=128,
torch_dtype=torch_dtype,
device=device,
)
dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = dataset[3]["audio"]
# --- Without prompt ---
result = pipe(sample.copy())
print("Without prompt:", result["text"])
# He has grave doubts whether Sir Frederick Leighton's work is really Greek after all, and can discover in it but little of Rocky Ithaca.
# --- With prompt ---
# Let's change the spelling of "Leighton" -> "Layton" by passing it as a prompt
prompt = "Layton"
prompt_ids = processor.get_prompt_ids(prompt, return_tensors="pt").to(device)
result = pipe(sample, generate_kwargs={"prompt_ids": prompt_ids})
print("With prompt:", result["text"][len(prompt) + 1:])
# He has grave doubts whether Sir Frederick Layton's work is really Greek after all, and can discover in it but little of Rocky Ithaca.
There's a bug here that the prompt is included in the generation, will be fixed by https://github.com/huggingface/transformers/pull/27836!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.