text-generation-inference
text-generation-inference copied to clipboard
Support for Mistral Small 3.1
Model description
Please add support for mistralai/Mistral-Small-3.1-24B-Instruct-2503 model.
Open source status
- [ ] The model implementation is available
- [x] The model weights are available
Provide useful links for the implementation
https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503
It is not supported yet :
inference-1 | 2025-03-26T11:05:12.546170Z ERROR text_generation_launcher: Error when initializing model
inference-1 | Traceback (most recent call last):
inference-1 | File "/usr/src/.venv/bin/text-generation-server", line 10, in <module>
inference-1 | sys.exit(app())
inference-1 | File "/usr/src/.venv/lib/python3.11/site-packages/typer/main.py", line 323, in __call__
inference-1 | return get_command(self)(*args, **kwargs)
inference-1 | File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 1161, in __call__
inference-1 | return self.main(*args, **kwargs)
inference-1 | File "/usr/src/.venv/lib/python3.11/site-packages/typer/core.py", line 743, in main
inference-1 | return _main(
inference-1 | File "/usr/src/.venv/lib/python3.11/site-packages/typer/core.py", line 198, in _main
inference-1 | rv = self.invoke(ctx)
inference-1 | File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 1697, in invoke
inference-1 | return _process_result(sub_ctx.command.invoke(sub_ctx))
inference-1 | File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 1443, in invoke
inference-1 | return ctx.invoke(self.callback, **ctx.params)
inference-1 | File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 788, in invoke
inference-1 | return __callback(*args, **kwargs)
inference-1 | File "/usr/src/.venv/lib/python3.11/site-packages/typer/main.py", line 698, in wrapper
inference-1 | return callback(**use_params)
inference-1 | File "/usr/src/server/text_generation_server/cli.py", line 119, in serve
inference-1 | server.serve(
inference-1 | File "/usr/src/server/text_generation_server/server.py", line 315, in serve
inference-1 | asyncio.run(
inference-1 | File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 190, in run
inference-1 | return runner.run(main)
inference-1 | File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 118, in run
inference-1 | return self._loop.run_until_complete(task)
inference-1 | File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
inference-1 | self.run_forever()
inference-1 | File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
inference-1 | self._run_once()
inference-1 | File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
inference-1 | handle._run()
inference-1 | File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/events.py", line 84, in _run
inference-1 | self._context.run(self._callback, *self._args)
inference-1 | > File "/usr/src/server/text_generation_server/server.py", line 268, in serve_inner
inference-1 | model = get_model_with_lora_adapters(
inference-1 | File "/usr/src/server/text_generation_server/models/__init__.py", line 1690, in get_model_with_lora_adapters
inference-1 | model = get_model(
inference-1 | File "/usr/src/server/text_generation_server/models/__init__.py", line 1654, in get_model
inference-1 | raise NotImplementedError("sharded is not supported for AutoModel")
inference-1 | NotImplementedError: sharded is not supported for AutoModel
+1, seeing the same error trace as @v3ss0n sharded is not supported for AutoModel
+1, a very much needed support, given the capabilities of this model.
Any news about this ? Mistral 3 is still not supported