llama2.zig
llama2.zig copied to clipboard
Inference Llama 2 in pure Zig
Bumps [transformers](https://github.com/huggingface/transformers) from 4.34.0 to 4.36.0. Release notes Sourced from transformers's releases. v4.36: Mixtral, Llava/BakLlava, SeamlessM4T v2, AMD ROCm, F.sdpa wide-spread support New model additions Mixtral Mixtral is the new...
See: - https://github.com/karpathy/llama2.c/issues/277 - https://github.com/karpathy/llama2.c/pull/298 - https://github.com/karpathy/llama2.c/pull/312 - https://github.com/karpathy/llama2.c/pull/364 - https://github.com/ggerganov/llama.cpp/issues/397 - https://arxiv.org/pdf/2101.01321v3.pdf
Model: https://huggingface.co/georgesung/llama2_7b_chat_uncensored
In `simd.zig`, the vector length is computed as ```zig comptime var vector_len = std.atomic.cache_line / @sizeOf(f32); ``` It looks to me like there is a unit mismatch here. `std.atomic.cache_line` is...