Ankan Banerjee
Results
2
issues of
Ankan Banerjee
slightly more than 2x speedup (for large batch sizes) on supported hardware, without much loss of precision.
lc0
Add simple cuda implementation for llama2 inference * < 750 lines of code. Idea is to keep it as simple as possible. * Decided to use FP16 to make llama-7b...