constriction icon indicating copy to clipboard operation
constriction copied to clipboard

Fast categorical distribution

Open robamler opened this issue 1 year ago • 0 comments

When creating one of the existing categorical distributions from (unquantized) floating-point probabilities, they calculate the exact optimal quantization, i.e., the quantization that minimizes the KL divergence to the unquantized distribution. This approach turns out to be excruciatingly slow, which is especially costly when applied to autoregressive models with a large vocabulary (where each categorical model is used only to encode or decode a single symbol before being discarded).

This PR will add additional variants of categorical distributions that use various combinations of:

  • a simplified quantization method (similar to the one used by LeakyQuantizer) that will be much faster at the price of a small overhead in bitrate; and (optionally)
  • lazy (symbol-local) quantization at encoding and decoding time rather than a global quantization of all symbols at model construction time.

robamler avatar Aug 16 '24 18:08 robamler