CTranslate2 Add support for shallow biasing of Whisper

Attempting to fix https://github.com/OpenNMT/CTranslate2/pull/1789 by @zwycl

Adds an optional contextual biasing parameter to the Whisper model to enable shallow contextual biasing toward given sequences by modifying logits. This is a flexible and fairly simple method useful for transcribing out-of-vocabulary entities in ASR or mitigating harmful mistranscriptions toward unwanted token sequences. Similar parameter is implemented in the HuggingFace package: https://huggingface.co/docs/transformers/en/internal/generation_utils#transformers.SequenceBiasLogitsProcessor

May 02 '25 22:05 anthonyrathe

Attempting to merge #1789

Do you have the python wheels for this branch? I would like to use it

May 06 '25 18:05 MrigankRaman

Attempting to merge #1789

Do you have the python wheels for this branch? I would like to use it

Yes, you can find the links to the wheels via the checks below.

May 18 '25 13:05 anthonyrathe

@minhthuc2502 would you mind taking a look at this PR? Would love to get this released!

May 18 '25 13:05 anthonyrathe

Attempting to merge #1789

Do you have the python wheels for this branch? I would like to use it

Yes, you can find the links to the wheels via the checks below.

There might be a bug in this PR. When I use sequence bias with a model compiled in int8_float16 I get this error

ValueError: expected storage to be of type float16, but is of type float32

which I dont get when I use sequence_bias as None. The exact same input

May 27 '25 22:05 MrigankRaman

Attempting to merge #1789

Do you have the python wheels for this branch? I would like to use it

Yes, you can find the links to the wheels via the checks below.

There might be a bug in this PR. When I use sequence bias with a model compiled in int8_float16 I get this error

ValueError: expected storage to be of type float16, but is of type float32

which I dont get when I use sequence_bias as None. The exact same input

Sorry I resolved. Does this increase latency?

May 27 '25 23:05 MrigankRaman

Sorry I resolved. Does this increase latency?

It does, unfortunately. Especially if the number of sequences to bias is large.

May 28 '25 07:05 anthonyrathe

Sorry I resolved. Does this increase latency?

It does, unfortunately. Especially if the number of sequences to bias is large.

Where does this latency increase come from?

May 29 '25 18:05 MrigankRaman

would it be possible to allow a list-of-lists of sequences to bias, so each sample in a batch can be biased separately?

this would match how the features and prompts args work already, where the first dimension is the batch size.

Jul 10 '25 22:07 ClaytonJY