llama.cpp Feature Request: Support Mistral-Large

Prerequisites

[X] I am running the latest code. Mention the version if possible as well.
[X] I carefully followed the README.md.
[X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[X] I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

How about supporting https://huggingface.co/mistralai/Mistral-Large-Instruct-2407?

It seems to me that you will have a lot of work for a long time)) New models are released almost weekly.

Motivation

Blog: https://mistral.ai/news/mistral-large-2407/

Possible Implementation

No response

Jul 24 '24 19:07 hackey

Hey @hackey, have you tried it? It seems to work prefectly fine for me. There is an open question about the context length though, which is marked as 32K in the config rather than the 128K from their blog post. I presume some RoPE setting must be used, but I haven't found any details about that just yet.

Jul 24 '24 19:07 dranger003

Mistral developers have made changes to the context length. So this request can be closed. Llama.cpp supports this model.

Jul 26 '24 06:07 hackey

I still can't quantify it properly

(llm_venv_llamacpp) xlab@xlab:/mnt/Model/MistralAI/llm_llamacpp$ python convert_hf_to_gguf.py /mnt/Model/MistralAI/Mistral-Large-Instruct-2407 --outfile ../llm_quantized/mistral_large2_instruct_f16.gguf --outtype f16 --no-lazy
INFO:hf-to-gguf:Loading model: Mistral-Large-Instruct-2407
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 131072
INFO:hf-to-gguf:gguf: embedding length = 12288
INFO:hf-to-gguf:gguf: feed forward length = 28672
INFO:hf-to-gguf:gguf: head count = 96
INFO:hf-to-gguf:gguf: key-value head count = 8
INFO:hf-to-gguf:gguf: rope theta = 1000000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Setting special token type bos to 1
INFO:gguf.vocab:Setting special token type eos to 2
INFO:gguf.vocab:Setting special token type unk to 0
INFO:gguf.vocab:Setting add_bos_token to True
INFO:gguf.vocab:Setting add_eos_token to False
INFO:gguf.vocab:Setting chat_template to {%- if messages[0]['role'] == 'system' %}
    {%- set system_message = messages[0]['content'] %}
    {%- set loop_messages = messages[1:] %}
{%- else %}
    {%- set loop_messages = messages %}
{%- endif %}

{{- bos_token }}
{%- for message in loop_messages %}
    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}
        {{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}
    {%- endif %}
    {%- if message['role'] == 'user' %}
        {%- if loop.last and system_message is defined %}
            {{- '[INST] ' + system_message + '\n\n' + message['content'] + '[/INST]' }}
        {%- else %}
            {{- '[INST] ' + message['content'] + '[/INST]' }}
        {%- endif %}
    {%- elif message['role'] == 'assistant' %}
        {{- ' ' + message['content'] + eos_token}}
    {%- else %}
        {{- raise_exception('Only user and assistant roles are supported, with the exception of an initial optional system message!') }}
    {%- endif %}
{%- endfor %}

INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:../llm_quantized/mistral_large2_instruct_f16.gguf: n_tensors = 0, total_size = negligible - metadata only
Writing: 0.00byte [00:00, ?byte/s]
INFO:hf-to-gguf:Model successfully exported to ../llm_quantized/mistral_large2_instruct_f16.gguf

Aug 01 '24 04:08 17Reset