cria
cria copied to clipboard

→

Tiny inference-only implementation of LLaMA

Inference-only implementation of LLaMA in plain NumPy

It uses NumPy, so it can run without a GPU. It also uses memory mapped files to load the weights, so you can run it with little memory.

Inspired by picoGPT.

Besides NumPy, it currently also has a dependency on Google's SentencePiece for tokenization.

Tiny inference-only implementation of LLaMA

Stars

Forks

Watchers

Stars

Forks

Watchers

Tiny inference-only implementation of LLaMA