LLaMPPL
LLaMPPL copied to clipboard
Buffer calls to LLM
In models that sample tokens from the prior, it is unnecessary to actually run the LLM on the newly sampled token unless the particle survives the next resampling step. Maybe there is a good way to buffer or lazily execute the LLM calls so that this optimization is automated.