lit-llama
lit-llama copied to clipboard
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
I want to train the 13B Lllama but with 8bit quantization LoRA. Rn it takes 70GB of GPU RAM which is quite a lot. I'm using 8xA100-80GB. `lora.py` ``` #...
Hi, I'm confused about where to find the tokenizer: --tokenizer_path checkpoints/lit-llama/tokenizer.model Referring here to the readme:  Where can I download it?
If I want to use llama3 through lit lama, how can I modify it? I found that the model structure of llama3 has changed
Hi I have pretrained a model and have it in lit-llama format. How can I convert it to huggingface format? I need to load my pretrained model via HuggingFace for...
When i invoke the generate function twice with different idx,it appears the fault described as "RuntimeError: The expanded size of the tensor (181) must match the existing size (168) at...
Hello, Thanks for the great work! I have a pre-trained Lit-Llama checkpoint that I'd like to convert to a format supported by HF, so that I could use it as...
upon every restart of finetune i see: "train data seems to have changed. restarting shuffled epoch." i looked up where it happens, added debugging line and it turned out that...
I noticed that `PackedDatasetBuilder` does not separate the tokens with `sep_token`. To illustrate, referencing https://github.com/Lightning-AI/lit-llama/blob/da71adea0970d6d950fb966d365cfb428aef8298/scripts/prepare_redpajama.py#L71 ```py builder = packed_dataset.PackedDatasetBuilder( outdir=destination_path, prefix=prefix, chunk_size=chunk_size, sep_token=tokenizer.bos_id, dtype="auto", vocab_size=tokenizer.vocab_size, ) ``` and https://github.com/Lightning-AI/lit-llama/blob/da71adea0970d6d950fb966d365cfb428aef8298/scripts/prepare_redpajama.py#L85 ```py...
When conducting generation for multiple consecutive inputs on a LoRA fine-tuned LLaMA, I noticed that using 'reset_cache' after each generation for one input will affect the performance of generation on...
How to train a Llama using TPUs?