Aaron Lee

Results 8 comments of Aaron Lee

Thanks all for the suggestions. Will definitely look to refactor into something nicer once correctness can be established. Right now, still trying to get the graph to compute. Turns out...

I've gotten to the point where I can get the MTP head to output stuff but managing KV cache with an external call to a separate MTP graph adds an...

On second thought, building a single augmented graph also doesn't work, because we need the main model's sampled token in the MTP subgraph. We could make some shortcut assumptions, like...

This commit sort of works, in the sense that it outputs tokens but - I can't guarantee that I didn't break things in the multi-slot case, - the model seems...

Okay, I believe this commit "works" in that both main model and MTP output both seem correct under my informal test conditions. The model is now about as coherent as...

> Tried to run it in RP scenario (using Q4 quant), got from 0.07 to 0.11 acceptance rate on swipes (one time unexpectedly got 0.18) (t=0.8, min p 0.05, top...

Upon a bit of testing on my end in RP/creative writing scenarios, I can't find any obvious issues in terms of correctness with the cache management of this prototype; I...

> Is work on this still progressing in the background? If not, then what kind of work still remains to be done? Is it mainly cleanup and refactoring? If so,...