text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

Support for Mistral Small 3.1

Open meetzuber opened this issue 1 year ago • 4 comments

Model description

Please add support for mistralai/Mistral-Small-3.1-24B-Instruct-2503 model.

Open source status

  • [ ] The model implementation is available
  • [x] The model weights are available

Provide useful links for the implementation

https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503

meetzuber avatar Mar 22 '25 01:03 meetzuber

It is not supported yet :

inference-1  | 2025-03-26T11:05:12.546170Z ERROR text_generation_launcher: Error when initializing model
inference-1  | Traceback (most recent call last):
inference-1  |   File "/usr/src/.venv/bin/text-generation-server", line 10, in <module>
inference-1  |     sys.exit(app())
inference-1  |   File "/usr/src/.venv/lib/python3.11/site-packages/typer/main.py", line 323, in __call__
inference-1  |     return get_command(self)(*args, **kwargs)
inference-1  |   File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 1161, in __call__
inference-1  |     return self.main(*args, **kwargs)
inference-1  |   File "/usr/src/.venv/lib/python3.11/site-packages/typer/core.py", line 743, in main
inference-1  |     return _main(
inference-1  |   File "/usr/src/.venv/lib/python3.11/site-packages/typer/core.py", line 198, in _main
inference-1  |     rv = self.invoke(ctx)
inference-1  |   File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 1697, in invoke
inference-1  |     return _process_result(sub_ctx.command.invoke(sub_ctx))
inference-1  |   File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 1443, in invoke
inference-1  |     return ctx.invoke(self.callback, **ctx.params)
inference-1  |   File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 788, in invoke
inference-1  |     return __callback(*args, **kwargs)
inference-1  |   File "/usr/src/.venv/lib/python3.11/site-packages/typer/main.py", line 698, in wrapper
inference-1  |     return callback(**use_params)
inference-1  |   File "/usr/src/server/text_generation_server/cli.py", line 119, in serve
inference-1  |     server.serve(
inference-1  |   File "/usr/src/server/text_generation_server/server.py", line 315, in serve
inference-1  |     asyncio.run(
inference-1  |   File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 190, in run
inference-1  |     return runner.run(main)
inference-1  |   File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 118, in run
inference-1  |     return self._loop.run_until_complete(task)
inference-1  |   File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
inference-1  |     self.run_forever()
inference-1  |   File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
inference-1  |     self._run_once()
inference-1  |   File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
inference-1  |     handle._run()
inference-1  |   File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/events.py", line 84, in _run
inference-1  |     self._context.run(self._callback, *self._args)
inference-1  | > File "/usr/src/server/text_generation_server/server.py", line 268, in serve_inner
inference-1  |     model = get_model_with_lora_adapters(
inference-1  |   File "/usr/src/server/text_generation_server/models/__init__.py", line 1690, in get_model_with_lora_adapters
inference-1  |     model = get_model(
inference-1  |   File "/usr/src/server/text_generation_server/models/__init__.py", line 1654, in get_model
inference-1  |     raise NotImplementedError("sharded is not supported for AutoModel")
inference-1  | NotImplementedError: sharded is not supported for AutoModel

v3ss0n avatar Mar 26 '25 11:03 v3ss0n

+1, seeing the same error trace as @v3ss0n sharded is not supported for AutoModel

ZQ-Dev8 avatar Apr 01 '25 21:04 ZQ-Dev8

+1, a very much needed support, given the capabilities of this model.

RobCzikkel avatar Apr 02 '25 11:04 RobCzikkel

Any news about this ? Mistral 3 is still not supported

gpadiolleau avatar Sep 15 '25 07:09 gpadiolleau