tiny-llm
tiny-llm copied to clipboard
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
Fixes issue where `pdm run main --solution ref --loader week1` would fail with "ModuleNotFoundError: No module named 'tiny_llm_ref'"
When I run `pdm run test-refsol -- -- -k week_1` ``` src/tiny_llm_ref/__init__.py:7: in from .generate import * E File "/Users/gao/develop/tiny-llm/src/tiny_llm_ref/generate.py", line 134 E print(f"+{progress} {text.replace('\n', ' ')[-80:]}") E ^ E...
- Add complete quantized_matmul_impl_typed template function for CPU (float16, float32, and bfloat16). - Add fp32 test cases for quantized_matmul. - Relax float32 tolerance in test utils.
during my own testing and people's feedbacks it seems that some kernels on M1 has precision issues in RoPE. Likely due to sin/cos. https://github.com/skyzh/tiny-llm/issues/27
pytorch's test is okay but I don know whether the mlx part will work
It would be great to add support for single-node, multi-process inference to Tiny LLM.
Thanks for creating this awesome tutorial. It's very helpful!! Just curious do we have any timeline for updating week2+ tasks/tests/docs? Totally understand that authors are busy, but just curious :)