Buffer calls to LLM

Open alex-lew opened this issue 2 years ago • 0 comments

In models that sample tokens from the prior, it is unnecessary to actually run the LLM on the newly sampled token unless the particle survives the next resampling step. Maybe there is a good way to buffer or lazily execute the LLM calls so that this optimization is automated.

May 22 '23 05:05 alex-lew