FlashTTS wsl中安装llama cpp cuda调用bug，Warming up...失败，服务启动失败

如下： .\sparktts_env\python.exe server.py --model_path Spark-TTS-0.5B --backend llama-cpp --llm_device cuda --tokenizer_device cuda --detokenizer_device cuda --wav2vec_attn_implementation sdpa --llm_attn_implementation sdpa --torch_dtype "bfloat16" --max_length 32768 --llm_gpu_memory_utilization 0.6 --host 0.0.0.0 --port 8000

[Fast-Spark-TTS] 2025-04-03 00:21:31 [INFO] [server:131] >> Warming up... [Fast-Spark-TTS] 2025-04-03 00:26:59 [ERROR] [spark_engine:313] >> Semantic tokens 预测为空，prompt：<|task_controllable_tts|><|start_content|>测试音频。<|end_content|><|start_style_label|><|gender_0|><|pitch_label_2|><|speed_label_2|><|end_style_label|>，llm output：GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG ERROR: Traceback (most recent call last): File "D:\AI\Fast-Spark-TTS-AllInOne-CUDA\sparktts_env\Lib\site-packages\starlette\routing.py", line 692, in lifespan async with self.lifespan_context(app) as maybe_state: ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\AI\Fast-Spark-TTS-AllInOne-CUDA\sparktts_env\Lib\contextlib.py", line 210, in aenter return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "D:\AI\Fast-Spark-TTS-AllInOne-CUDA\sparktts_env\Lib\site-packages\fastapi\routing.py", line 133, in merged_lifespan async with original_context(app) as maybe_original_state: ^^^^^^^^^^^^^^^^^^^^^ File "D:\AI\Fast-Spark-TTS-AllInOne-CUDA\sparktts_env\Lib\contextlib.py", line 210, in aenter return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "D:\AI\Fast-Spark-TTS-AllInOne-CUDA\server.py", line 384, in lifespan await warmup_engine(engine) File "D:\AI\Fast-Spark-TTS-AllInOne-CUDA\server.py", line 132, in warmup_engine await async_engine.generate_voice_async( File "D:\AI\Fast-Spark-TTS-AllInOne-CUDA\fast_tts\engine\spark_engine.py", line 775, in generate_voice_async first_output = await generate_audio(segments[0], acoustic_token=None) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\AI\Fast-Spark-TTS-AllInOne-CUDA\fast_tts\engine\spark_engine.py", line 749, in generate_audio generated = await self._generate_audio_tokens( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\AI\Fast-Spark-TTS-AllInOne-CUDA\fast_tts\engine\spark_engine.py", line 314, in _generate_audio_tokens raise ValueError(err_msg) ValueError: Semantic tokens 预测为空，prompt：<|task_controllable_tts|><|start_content|>测试音频。<|end_content|><|start_style_label|><|gender_0|><|pitch_label_2|><|speed_label_2|><|end_style_label|>，llm output：GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG.......................

ERROR: Application startup failed. Exiting.

Apr 02 '25 16:04 echotxl

使用sglang也是报错，显卡2080ti

(tts) echo@DESKTOP-07VNMO4:~/Fast-Spark-TTS$ python server.py --model_path Spark-TTS-0.5B --backend sglang --llm_device cuda --tokenizer_device cuda --detokenizer_device cuda --max_length 32768 --llm_gpu_memory_utilization 0.6 --host 0.0.0.0 --port 8000 [Fast-Spark-TTS] 2025-04-03 23:31:42 [INFO] [server:458] >> 启动 FastTTS 服务 [Fast-Spark-TTS] 2025-04-03 23:31:42 [INFO] [server:459] >> Config: Namespace(model_path='Spark-TTS-0.5B', backend='sglang', llm_device='cuda', tokenizer_device='cuda', detokenizer_device='cuda', wav2vec_attn_implementation='eager', llm_attn_implementation='eager', max_length=32768, llm_gpu_memory_utilization=0.6, torch_dtype='auto', cache_implementation=None, role_dir='data/roles', api_key=None, seed=0, batch_size=32, wait_timeout=0.01, host='0.0.0.0', port=8000) INFO: Started server process [7072] INFO: Waiting for application startup. /home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/torch/nn/utils/weight_norm.py:143: FutureWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. WeightNorm.apply(module, name, dim) Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp-a34b3233.so.1 library. Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it. Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp-a34b3233.so.1 library. Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it. Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:01<00:00, 1.78s/it] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:01<00:00, 1.78s/it]

Capturing batches (avail_mem=7.61 GB): 0%| | 0/4 [00:00<?, ?it/s] [2025-04-03 23:31:56 TP0] Scheduler hit an exception: Traceback (most recent call last): File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 1999, in run_scheduler_process scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 249, in init self.tp_worker = TpWorkerClass( ^^^^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 63, in init self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/managers/tp_worker.py", line 74, in init self.model_runner = ModelRunner( ^^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 169, in init self.initialize(min_per_gpu_memory) File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 207, in initialize self.init_cuda_graphs() File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 931, in init_cuda_graphs self.cuda_graph_runner = CudaGraphRunner(self) ^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 265, in init self.capture() File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 349, in capture ) = self.capture_one_batch_size(bs, forward) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 419, in capture_one_batch_size self.model_runner.attn_backend.init_forward_metadata_capture_cuda_graph( File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/layers/attention/flashinfer_backend.py", line 303, in init_forward_metadata_capture_cuda_graph self.indices_updater_decode.update( File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/layers/attention/flashinfer_backend.py", line 553, in update_single_wrapper self.call_begin_forward( File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/layers/attention/flashinfer_backend.py", line 650, in call_begin_forward create_flashinfer_kv_indices_triton[(bs,)]( File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/triton/runtime/jit.py", line 345, in return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/triton/runtime/jit.py", line 607, in run device = driver.active.get_current_device() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/triton/runtime/driver.py", line 23, in getattr self._initialize_obj() File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj self._obj = self._init_fn() ^^^^^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/triton/runtime/driver.py", line 9, in _create_driver return actives0 ^^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/triton/backends/nvidia/driver.py", line 371, in init self.utils = CudaUtils() # TODO: make static ^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/triton/backends/nvidia/driver.py", line 80, in init mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/triton/backends/nvidia/driver.py", line 62, in compile_module_from_src mod = importlib.util.module_from_spec(spec) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "", line 813, in module_from_spec File "", line 1293, in create_module File "", line 488, in _call_with_frames_removed ImportError: /home/echo/.triton/cache/41ce1f58e0a8aa9865e66b90d58b3307bb64c5a006830e49543444faf56202fc/cuda_utils.so: file too short

[2025-04-03 23:31:56] Received sigquit from a child process. It usually means the child failed. Killed

Apr 03 '25 16:04 echotxl

wsl安装vllm同样报错呢

(tts) echo@DESKTOP-07VNMO4:~/Fast-Spark-TTS$ python server.py --model_path Spark-TTS-0.5B --backend sglang --llm_device cuda --tokenizer_device cuda --detokenizer_device cuda --wav2vec_attn_implementation sdpa --llm_attn_implementation sdpa --torch_dtype "float32" --max_length 32768 --llm_gpu_memory_utilization 0.6 --host 0.0.0.0 --port 8000 [Fast-Spark-TTS] 2025-04-03 23:57:25 [INFO] [server:458] >> 启动 FastTTS 服务 [Fast-Spark-TTS] 2025-04-03 23:57:25 [INFO] [server:459] >> Config: Namespace(model_path='Spark-TTS-0.5B', backend='sglang', llm_device='cuda', tokenizer_device='cuda', detokenizer_device='cuda', wav2vec_attn_implementation='sdpa', llm_attn_implementation='sdpa', max_length=32768, llm_gpu_memory_utilization=0.6, torch_dtype='float32', cache_implementation=None, role_dir='data/roles', api_key=None, seed=0, batch_size=32, wait_timeout=0.01, host='0.0.0.0', port=8000) INFO: Started server process [17006] INFO: Waiting for application startup. /home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/torch/nn/utils/weight_norm.py:143: FutureWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. WeightNorm.apply(module, name, dim) INFO 04-03 23:57:29 [init.py:239] Automatically detected platform cuda. INFO 04-03 23:57:34 [init.py:239] Automatically detected platform cuda. INFO 04-03 23:57:34 [init.py:239] Automatically detected platform cuda. Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:01<00:00, 1.84s/it] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:01<00:00, 1.84s/it]

Capturing batches (avail_mem=7.53 GB): 100%|██████████████████████████████████████████████| 4/4 [00:02<00:00, 1.65it/s] [Fast-Spark-TTS] 2025-04-03 23:57:46 [INFO] [server:101] >> 加载角色库：data/roles [Fast-Spark-TTS] 2025-04-03 23:57:48 [INFO] [server:125] >> 角色库加载完毕，角色有：刘德华、殷夫人、徐志胜、李靖、赞助商、Donald Trump、哪吒、后羿、周杰伦、陈鲁豫、余承东 [Fast-Spark-TTS] 2025-04-03 23:57:48 [INFO] [server:131] >> Warming up... [Fast-Spark-TTS] 2025-04-03 23:57:52 [ERROR] [spark_engine:313] >> Semantic tokens 预测为空，prompt：<|task_controllable_tts|><|start_content|>测试音频。<|end_content|><|start_style_label|><|gender_0|><|pitch_label_2|><|speed_label_2|><|end_style_label|>，llm output： ERROR: Traceback (most recent call last): File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/starlette/routing.py", line 692, in lifespan async with self.lifespan_context(app) as maybe_state: ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/contextlib.py", line 210, in aenter return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/fastapi/routing.py", line 133, in merged_lifespan async with original_context(app) as maybe_original_state: ^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/contextlib.py", line 210, in aenter return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/Fast-Spark-TTS/server.py", line 384, in lifespan await warmup_engine(engine) File "/home/echo/Fast-Spark-TTS/server.py", line 132, in warmup_engine await async_engine.generate_voice_async( File "/home/echo/Fast-Spark-TTS/fast_tts/engine/spark_engine.py", line 775, in generate_voice_async first_output = await generate_audio(segments[0], acoustic_token=None) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/Fast-Spark-TTS/fast_tts/engine/spark_engine.py", line 749, in generate_audio generated = await self._generate_audio_tokens( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/Fast-Spark-TTS/fast_tts/engine/spark_engine.py", line 314, in _generate_audio_tokens raise ValueError(err_msg) ValueError: Semantic tokens 预测为空，prompt：<|task_controllable_tts|><|start_content|>测试音频。<|end_content|><|start_style_label|><|gender_0|><|pitch_label_2|><|speed_label_2|><|end_style_label|>，llm output：

ERROR: Application startup failed. Exiting.

Apr 03 '25 16:04 echotxl

显卡是2080ti

Apr 03 '25 16:04 echotxl

wsl安装vllm同样报错呢

(tts) echo@DESKTOP-07VNMO4:~/Fast-Spark-TTS$ python server.py --model_path Spark-TTS-0.5B --backend sglang --llm_device cuda --tokenizer_device cuda --detokenizer_device cuda --wav2vec_attn_implementation sdpa --llm_attn_implementation sdpa --torch_dtype "float32" --max_length 32768 --llm_gpu_memory_utilization 0.6 --host 0.0.0.0 --port 8000 [Fast-Spark-TTS] 2025-04-03 23:57:25 [INFO] [server:458] >> 启动 FastTTS 服务 [Fast-Spark-TTS] 2025-04-03 23:57:25 [INFO] [server:459] >> Config: Namespace(model_path='Spark-TTS-0.5B', backend='sglang', llm_device='cuda', tokenizer_device='cuda', detokenizer_device='cuda', wav2vec_attn_implementation='sdpa', llm_attn_implementation='sdpa', max_length=32768, llm_gpu_memory_utilization=0.6, torch_dtype='float32', cache_implementation=None, role_dir='data/roles', api_key=None, seed=0, batch_size=32, wait_timeout=0.01, host='0.0.0.0', port=8000) INFO: Started server process [17006] INFO: Waiting for application startup. /home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/torch/nn/utils/weight_norm.py:143: FutureWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. WeightNorm.apply(module, name, dim) INFO 04-03 23:57:29 [init.py:239] Automatically detected platform cuda. INFO 04-03 23:57:34 [init.py:239] Automatically detected platform cuda. INFO 04-03 23:57:34 [init.py:239] Automatically detected platform cuda. Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:01<00:00, 1.84s/it] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:01<00:00, 1.84s/it]

Capturing batches (avail_mem=7.53 GB): 100%|██████████████████████████████████████████████| 4/4 [00:02<00:00, 1.65it/s] [Fast-Spark-TTS] 2025-04-03 23:57:46 [INFO] [server:101] >> 加载角色库：data/roles [Fast-Spark-TTS] 2025-04-03 23:57:48 [INFO] [server:125] >> 角色库加载完毕，角色有：刘德华、殷夫人、徐志胜、李靖、赞助商、Donald Trump、哪吒、后羿、周杰伦、陈鲁豫、余承东 [Fast-Spark-TTS] 2025-04-03 23:57:48 [INFO] [server:131] >> Warming up... [Fast-Spark-TTS] 2025-04-03 23:57:52 [ERROR] [spark_engine:313] >> Semantic tokens 预测为空，prompt：<|task_controllable_tts|><|start_content|>测试音频。<|end_content|><|start_style_label|><|gender_0|><|pitch_label_2|><|speed_label_2|><|end_style_label|>，llm output： ERROR: Traceback (most recent call last): File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/starlette/routing.py", line 692, in lifespan async with self.lifespan_context(app) as maybe_state: ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/contextlib.py", line 210, in aenter return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/fastapi/routing.py", line 133, in merged_lifespan async with original_context(app) as maybe_original_state: ^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/contextlib.py", line 210, in aenter return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/Fast-Spark-TTS/server.py", line 384, in lifespan await warmup_engine(engine) File "/home/echo/Fast-Spark-TTS/server.py", line 132, in warmup_engine await async_engine.generate_voice_async( File "/home/echo/Fast-Spark-TTS/fast_tts/engine/spark_engine.py", line 775, in generate_voice_async first_output = await generate_audio(segments[0], acoustic_token=None) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/Fast-Spark-TTS/fast_tts/engine/spark_engine.py", line 749, in generate_audio generated = await self._generate_audio_tokens( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/Fast-Spark-TTS/fast_tts/engine/spark_engine.py", line 314, in _generate_audio_tokens raise ValueError(err_msg) ValueError: Semantic tokens 预测为空，prompt：<|task_controllable_tts|><|start_content|>测试音频。<|end_content|><|start_style_label|><|gender_0|><|pitch_label_2|><|speed_label_2|><|end_style_label|>，llm output：

ERROR: Application startup failed. Exiting.

vllm重启后成功了，sglang仍然在warming up报错退出

Apr 04 '25 15:04 echotxl

sglang会存在这个错误，使用vllm 和torch不会

Apr 11 '25 06:04 ZanePoe

sglang会存在这个错误，使用vllm 和torch不会

是的，我改成vllm了，torch速度不行

Apr 11 '25 07:04 echotxl