LLaMPPL icon indicating copy to clipboard operation
LLaMPPL copied to clipboard

Buffer calls to LLM

Open alex-lew opened this issue 2 years ago • 0 comments

In models that sample tokens from the prior, it is unnecessary to actually run the LLM on the newly sampled token unless the particle survives the next resampling step. Maybe there is a good way to buffer or lazily execute the LLM calls so that this optimization is automated.

alex-lew avatar May 22 '23 05:05 alex-lew