transformers icon indicating copy to clipboard operation
transformers copied to clipboard

openai/whisper-large-v2 prompt

Open kirollosHossam opened this issue 1 year ago • 1 comments

System Info

  • transformers version: 4.38.2
  • Platform: Windows-10-10.0.22621-SP0
  • Python version: 3.9.18
  • Huggingface_hub version: 0.20.3
  • Safetensors version: 0.4.2
  • Accelerate version: 0.27.2
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.2.0 (True)

Who can help?

@sanchit-gandhi

Information

  • [ ] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

def get_transcription_local_model(file_path):

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "openai/whisper-large-v2"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)
result = pipe(file_path,generate_kwargs={"language": "en","task": "translate"})
return result['text']  

Expected behavior

i used the following code to translate from Arabic audio to English Text , and i need to add some custom words to be tokenized ,also i used the openai api prompt (initial_prompt) and it works well , but the accuracy of the following code is better than api . i need to know in which part of the following code i can add the prompt.

kirollosHossam avatar Mar 13 '24 09:03 kirollosHossam

cc @ylacombe too

amyeroberts avatar Apr 15 '24 10:04 amyeroberts

Hey @kirollosHossam, I not sure to follow what's your request here? Could you detail it a bit? Many thanks!

ylacombe avatar May 13 '24 16:05 ylacombe

Will be resolved by #28556, until which time you can manually compute your prompt ids, and pass them to the pipeline as follows:

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset


device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "distil-whisper/distil-large-v2"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    torch_dtype=torch_dtype,
    device=device,
)

dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = dataset[3]["audio"]

# --- Without prompt ---
result = pipe(sample.copy())
print("Without prompt:", result["text"])
# He has grave doubts whether Sir Frederick Leighton's work is really Greek after all, and can discover in it but little of Rocky Ithaca.

# --- With prompt ---
# Let's change the spelling of "Leighton" -> "Layton" by passing it as a prompt
prompt = "Layton"
prompt_ids = processor.get_prompt_ids(prompt, return_tensors="pt").to(device)
result = pipe(sample, generate_kwargs={"prompt_ids": prompt_ids})
print("With prompt:", result["text"][len(prompt) + 1:])
# He has grave doubts whether Sir Frederick Layton's work is really Greek after all, and can discover in it but little of Rocky Ithaca.

sanchit-gandhi avatar May 20 '24 16:05 sanchit-gandhi

There's a bug here that the prompt is included in the generation, will be fixed by https://github.com/huggingface/transformers/pull/27836!

sanchit-gandhi avatar May 22 '24 13:05 sanchit-gandhi

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jun 16 '24 08:06 github-actions[bot]