Ankan Banerjee issues

Repositories
Issues
Comments

Results 2 issues of


                                            Ankan Banerjee

lc0-cudnn : add support for fp16 network eval

slightly more than 2x speedup (for large batch sizes) on supported hardware, without much loss of precision.

lc0

llama2.cu - a simple cuda implementation

Add simple cuda implementation for llama2 inference * < 750 lines of code. Idea is to keep it as simple as possible. * Decided to use FP16 to make llama-7b...