speculative_decoding.c
speculative_decoding.c copied to clipboard
minimal C implementation of speculative decoding based on llama2.c
Results
1
speculative_decoding.c issues
Sort by
recently updated
recently updated
newest added
I tried running the example (stories42M/stories15M), comparing timing against the original llama2.c (tok/sec), and this variant runs slower. Is that to be expected?