Nicky Pochinkov

Results 8 comments of Nicky Pochinkov

Looks like the last issue is mentioned here: https://github.com/matrix-org/dendrite/issues/1567 not sure if they have a decision on how best to handle unknown options other than to give an error, so...

See this commit (https://github.com/karpathy/llama2.c/pull/395/commits/fc11cc387b47efd98ca4ac0956f715d2e5451c41) or [in line L224 in `model.py`](https://github.com/karpathy/llama2.c/blob/766a30bc6e9a1c69ce007bb69caabf4c6062f0e9/model.py#L224) to see where weights are tied, or more discussion in this issue: https://github.com/karpathy/llama2.c/issues/321#issuecomment-1722272404

Didn't see this before, but I have submitted PR https://github.com/karpathy/llama2.c/pull/395 trying to address the same thing

It works fine for me, based on [the models I uploaded to HF hub](https://huggingface.co/models?sort=trending&search=nickypro%2Ftinyllama). I would guess the issue for Xenova was that the key and query matrices needed to...

I think this is just because for some reason, with language models, it is standard convention to not count the token embedding parameters when totalling the number of parameters. For...

Ok, I was briefly confused why the size is smaller on the stories15M. I load: ``` >>> params = torch.load("orig_models/stories15M.pt") >>> print(sum([v.numel() for k, v in params['model'].items()])) 24407712 >>> print([v.dtype==torch.float32...

Ok, I have pushed the changes for tied weights, and made the models with fp16 and fp32 [available on the HuggingFace hub](https://huggingface.co/models?sort=trending&search=nickypro%2Ftinyllama)

I wrote some code for conversion in the other direction here: https://github.com/karpathy/llama2.c/pull/395 though have not fully tested it. It may possibly have relevant information for your needs.