Brian Williams
Results
2
issues of
Brian Williams
Added a scaling_factor to the rotary embedding calculation. This is for use with models like [DeepSeek](https://github.com/deepseek-ai/). DeepSeek uses LlamaLinearScalingRotaryEmbedding. The only difference is that the freqs in precompute_freqs_cis are divided...
CLA Signed
Small models in HF don't have pytorch_model.bin.index.json files, since they are unnecessary. I changed the convert_hf_checkpoint.py to allow a single pytorch_model.bin file as the model description. I added PY007/TinyLlama-1.1B-intermediate-step-480k-1T to...
CLA Signed