zjding
zjding
> > I have upgraded to dify 0.5.3, but the max token can still only be set to 2048. > > > BTW, max_tokens is different from context_size, max_tokens specifies...
> @lileiseven Something like this. Thanks. Maybe that works on ollama, but the xinference don't have such options, I can only set less than 2048 in dify's interface, is that...
oh, i found it , the url in get_data.sh have been changed to : wget https://msmarco.z22.web.core.windows.net/msmarcoranking/qidpidtriples.train.full.2.tsv.gz wget https://msmarco.z22.web.core.windows.net/msmarcoranking/qrels.train.tsv -O qrels.train.tsv
> 这个应该是和 falsh-attn 有关 很有可能,但是turing架构不支持flash-attn,deepspeed一定要用flash-attn吗?deepspeed的配置有点复杂没太看懂。
> 不一定要用 flash-attn 感谢老哥回复,这套框架下关闭flash-attn能简单讲一下吗,我试过关闭还是出错,可能是我方法不对
Support for different models provider such as gemini, deepseek et, and change the tokenizer to match the models.