Add support for shallow biasing of Whisper
Attempting to fix https://github.com/OpenNMT/CTranslate2/pull/1789 by @zwycl
Adds an optional contextual biasing parameter to the Whisper model to enable shallow contextual biasing toward given sequences by modifying logits. This is a flexible and fairly simple method useful for transcribing out-of-vocabulary entities in ASR or mitigating harmful mistranscriptions toward unwanted token sequences. Similar parameter is implemented in the HuggingFace package: https://huggingface.co/docs/transformers/en/internal/generation_utils#transformers.SequenceBiasLogitsProcessor
Attempting to merge #1789
Do you have the python wheels for this branch? I would like to use it
Attempting to merge #1789
Do you have the python wheels for this branch? I would like to use it
Yes, you can find the links to the wheels via the checks below.
@minhthuc2502 would you mind taking a look at this PR? Would love to get this released!
Attempting to merge #1789
Do you have the python wheels for this branch? I would like to use it
Yes, you can find the links to the wheels via the checks below.
There might be a bug in this PR. When I use sequence bias with a model compiled in int8_float16 I get this error
ValueError: expected storage to be of type float16, but is of type float32
which I dont get when I use sequence_bias as None. The exact same input
Attempting to merge #1789
Do you have the python wheels for this branch? I would like to use it
Yes, you can find the links to the wheels via the checks below.
There might be a bug in this PR. When I use sequence bias with a model compiled in int8_float16 I get this error
ValueError: expected storage to be of type float16, but is of type float32
which I dont get when I use sequence_bias as None. The exact same input
Sorry I resolved. Does this increase latency?
Sorry I resolved. Does this increase latency?
It does, unfortunately. Especially if the number of sequences to bias is large.
Sorry I resolved. Does this increase latency?
It does, unfortunately. Especially if the number of sequences to bias is large.
Where does this latency increase come from?
would it be possible to allow a list-of-lists of sequences to bias, so each sample in a batch can be biased separately?
this would match how the features and prompts args work already, where the first dimension is the batch size.