constriction
constriction copied to clipboard
Fast categorical distribution
When creating one of the existing categorical distributions from (unquantized) floating-point probabilities, they calculate the exact optimal quantization, i.e., the quantization that minimizes the KL divergence to the unquantized distribution. This approach turns out to be excruciatingly slow, which is especially costly when applied to autoregressive models with a large vocabulary (where each categorical model is used only to encode or decode a single symbol before being discarded).
This PR will add additional variants of categorical distributions that use various combinations of:
- a simplified quantization method (similar to the one used by
LeakyQuantizer) that will be much faster at the price of a small overhead in bitrate; and (optionally) - lazy (symbol-local) quantization at encoding and decoding time rather than a global quantization of all symbols at model construction time.