Harvey
Harvey
I would be grateful for any input
There is no way to do it. Let me take the liberty to say, is there really a team maintaining this chain?
> [#1501 (comment)](https://github.com/kvcache-ai/ktransformers/issues/1501#issuecomment-3296117542) > > 我遇到这个问题修复后 Injecting cache as default Injecting lm_head as ktransformers.operators.linear . KTransformersLinear loading model.embed_tokens.weight to cpu loading model.layers.0.linear_attn.dt_bias to cuda loading model.layers.0.linear_attn.A_log to cuda loading model.layers.0.linear_attn.conv1d.weight...
当我使用以下命令并使用next原版模型时,它可以正常启动并运行。 must use "USE_BALANCE_SERVE=1 bash ./install.sh" 但我尝试量化版本 https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 时,它一直报错,是因为官方还没匹配吗? ``` python ktransformers/server/main.py \ --port 10021 \ --model_path /home/ubuntu/data/Qwen3-Next-80B-A3B-Thinking-FP8 \ --gguf_path /home/ubuntu/data/Qwen3-Next-80B-A3B-Thinking-FP8 \ --model_name Qwen3NextForCausalLM \ --optimize_config_path /home/ubuntu/ktransformers/ktransformers/optimize/optimize_rules/Qwen3Next-serve.yaml \ --max_new_tokens 1024...
你先可以跑了吗?我用原版qwen3-next是可以跑的。但 https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 版本报错 ``` Injecting lm_head as ktransformers.operators.linear . KTransformersLinear loading model.embed_tokens.weight to cpu loading model.layers.0.linear_attn.dt_bias to cuda loading model.layers.0.linear_attn.A_log to cuda loading model.layers.0.linear_attn.conv1d.weight to cuda:0 Process SpawnProcess-1: Traceback (most...
I had the same error and it was always asking to set ANTHROPIC_API_KEY