HeroSong666 comments

Results 13 comments of


                                            HeroSong666

Better inference based on starcode2-3b model

Or, I noticed that in https://huggingface.co/bigcode/starcoder2-3b The inference API can generate code piece by piece, each time I press the Compute. How can I implement such functionality? (For example, in...

Better inference based on starcode2-3b model

关于语音合成TTS的中英混合模型

> 社区有同学正在贡献。上线了可以@一下我吗？谢谢！

关于文本摘要生成算法中，输入字数限制的问题

需要输入2000字以上的文本，请问输入文本的最长长度限制在哪里可以修改呢？谢谢！

An API that can connect the code model and the chat model at the same time (just like Playground)

By the way, when I talk to the model in the Playground chat box, if I ask a code-related question, is the answer generated based on the code model, or...

Error occurs about tabby-python-client

I am new to tabby, and i try to build a python api for the tabby server so that I can access the server through python and chat. (rather than...

Error occurs about tabby-python-client

> The `tabby_python_client` is private to `tabby-eval` and should not be considered stable as an SDK. > > For API usage, please refer directly to our REST API documentation: https://tabby.tabbyml.com/api...

您好，在运行过程中出现了如下的问题，请问是不是gensim的版本问题？该如何解决呢？

我当前gensim的版本是4.3.0

ollama does not work on ALL GPU automatically

> 您能否确认模型是否已完全加载到一个 GPU 上？如果是，这是预期的行为。如果模型合适，Ollama 将使用单个 GPU，因为将模型拆分到多个 GPU 上会导致性能下降。这样可以释放其他 GPU 以用于其他模型 > > 如果你确实希望它分布在所有 GPU 上，则可以使用环境变量禁用此行为`OLLAMA_SCHED_SPREAD=1` The model is fully loaded onto the one GPU. In my usage scenario, there may...

ollama does not work on ALL GPU automatically

> Can you confirm if the model is fully loaded onto the one GPU? If it is, this is the expected behaviour. Ollama will use a single GPU if the...