wsl中安装llama cpp cuda调用bug,Warming up...失败,服务启动失败
如下: .\sparktts_env\python.exe server.py --model_path Spark-TTS-0.5B --backend llama-cpp --llm_device cuda --tokenizer_device cuda --detokenizer_device cuda --wav2vec_attn_implementation sdpa --llm_attn_implementation sdpa --torch_dtype "bfloat16" --max_length 32768 --llm_gpu_memory_utilization 0.6 --host 0.0.0.0 --port 8000
[Fast-Spark-TTS] 2025-04-03 00:21:31 [INFO] [server:131] >> Warming up... [Fast-Spark-TTS] 2025-04-03 00:26:59 [ERROR] [spark_engine:313] >> Semantic tokens 预测为空,prompt:<|task_controllable_tts|><|start_content|>测试音频。<|end_content|><|start_style_label|><|gender_0|><|pitch_label_2|><|speed_label_2|><|end_style_label|>,llm output:GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG ERROR: Traceback (most recent call last): File "D:\AI\Fast-Spark-TTS-AllInOne-CUDA\sparktts_env\Lib\site-packages\starlette\routing.py", line 692, in lifespan async with self.lifespan_context(app) as maybe_state: ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\AI\Fast-Spark-TTS-AllInOne-CUDA\sparktts_env\Lib\contextlib.py", line 210, in aenter return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "D:\AI\Fast-Spark-TTS-AllInOne-CUDA\sparktts_env\Lib\site-packages\fastapi\routing.py", line 133, in merged_lifespan async with original_context(app) as maybe_original_state: ^^^^^^^^^^^^^^^^^^^^^ File "D:\AI\Fast-Spark-TTS-AllInOne-CUDA\sparktts_env\Lib\contextlib.py", line 210, in aenter return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "D:\AI\Fast-Spark-TTS-AllInOne-CUDA\server.py", line 384, in lifespan await warmup_engine(engine) File "D:\AI\Fast-Spark-TTS-AllInOne-CUDA\server.py", line 132, in warmup_engine await async_engine.generate_voice_async( File "D:\AI\Fast-Spark-TTS-AllInOne-CUDA\fast_tts\engine\spark_engine.py", line 775, in generate_voice_async first_output = await generate_audio(segments[0], acoustic_token=None) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\AI\Fast-Spark-TTS-AllInOne-CUDA\fast_tts\engine\spark_engine.py", line 749, in generate_audio generated = await self._generate_audio_tokens( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\AI\Fast-Spark-TTS-AllInOne-CUDA\fast_tts\engine\spark_engine.py", line 314, in _generate_audio_tokens raise ValueError(err_msg) ValueError: Semantic tokens 预测为空,prompt:<|task_controllable_tts|><|start_content|>测试音频。<|end_content|><|start_style_label|><|gender_0|><|pitch_label_2|><|speed_label_2|><|end_style_label|>,llm output:GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG.......................
ERROR: Application startup failed. Exiting.
使用sglang也是报错,显卡2080ti
(tts) echo@DESKTOP-07VNMO4:~/Fast-Spark-TTS$ python server.py --model_path Spark-TTS-0.5B --backend sglang --llm_device cuda --tokenizer_device cuda --detokenizer_device cuda --max_length 32768 --llm_gpu_memory_utilization 0.6 --host 0.0.0.0 --port 8000
[Fast-Spark-TTS] 2025-04-03 23:31:42 [INFO] [server:458] >> 启动 FastTTS 服务
[Fast-Spark-TTS] 2025-04-03 23:31:42 [INFO] [server:459] >> Config: Namespace(model_path='Spark-TTS-0.5B', backend='sglang', llm_device='cuda', tokenizer_device='cuda', detokenizer_device='cuda', wav2vec_attn_implementation='eager', llm_attn_implementation='eager', max_length=32768, llm_gpu_memory_utilization=0.6, torch_dtype='auto', cache_implementation=None, role_dir='data/roles', api_key=None, seed=0, batch_size=32, wait_timeout=0.01, host='0.0.0.0', port=8000)
INFO: Started server process [7072]
INFO: Waiting for application startup.
/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/torch/nn/utils/weight_norm.py:143: FutureWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
WeightNorm.apply(module, name, dim)
Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp-a34b3233.so.1 library.
Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp-a34b3233.so.1 library.
Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:01<00:00, 1.78s/it]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:01<00:00, 1.78s/it]
Capturing batches (avail_mem=7.61 GB): 0%| | 0/4 [00:00<?, ?it/s]
[2025-04-03 23:31:56 TP0] Scheduler hit an exception: Traceback (most recent call last):
File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 1999, in run_scheduler_process
scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 249, in init
self.tp_worker = TpWorkerClass(
^^^^^^^^^^^^^^
File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 63, in init
self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/managers/tp_worker.py", line 74, in init
self.model_runner = ModelRunner(
^^^^^^^^^^^^
File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 169, in init
self.initialize(min_per_gpu_memory)
File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 207, in initialize
self.init_cuda_graphs()
File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 931, in init_cuda_graphs
self.cuda_graph_runner = CudaGraphRunner(self)
^^^^^^^^^^^^^^^^^^^^^
File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 265, in init
self.capture()
File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 349, in capture
) = self.capture_one_batch_size(bs, forward)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 419, in capture_one_batch_size
self.model_runner.attn_backend.init_forward_metadata_capture_cuda_graph(
File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/layers/attention/flashinfer_backend.py", line 303, in init_forward_metadata_capture_cuda_graph
self.indices_updater_decode.update(
File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/layers/attention/flashinfer_backend.py", line 553, in update_single_wrapper
self.call_begin_forward(
File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/sglang/srt/layers/attention/flashinfer_backend.py", line 650, in call_begin_forward
create_flashinfer_kv_indices_triton[(bs,)](
File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/triton/runtime/jit.py", line 345, in
[2025-04-03 23:31:56] Received sigquit from a child process. It usually means the child failed. Killed
wsl安装vllm同样报错呢
(tts) echo@DESKTOP-07VNMO4:~/Fast-Spark-TTS$ python server.py --model_path Spark-TTS-0.5B --backend sglang --llm_device cuda --tokenizer_device cuda --detokenizer_device cuda --wav2vec_attn_implementation sdpa --llm_attn_implementation sdpa --torch_dtype "float32" --max_length 32768 --llm_gpu_memory_utilization 0.6 --host 0.0.0.0 --port 8000
[Fast-Spark-TTS] 2025-04-03 23:57:25 [INFO] [server:458] >> 启动 FastTTS 服务
[Fast-Spark-TTS] 2025-04-03 23:57:25 [INFO] [server:459] >> Config: Namespace(model_path='Spark-TTS-0.5B', backend='sglang', llm_device='cuda', tokenizer_device='cuda', detokenizer_device='cuda', wav2vec_attn_implementation='sdpa', llm_attn_implementation='sdpa', max_length=32768, llm_gpu_memory_utilization=0.6, torch_dtype='float32', cache_implementation=None, role_dir='data/roles', api_key=None, seed=0, batch_size=32, wait_timeout=0.01, host='0.0.0.0', port=8000)
INFO: Started server process [17006]
INFO: Waiting for application startup.
/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/torch/nn/utils/weight_norm.py:143: FutureWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
WeightNorm.apply(module, name, dim)
INFO 04-03 23:57:29 [init.py:239] Automatically detected platform cuda.
INFO 04-03 23:57:34 [init.py:239] Automatically detected platform cuda.
INFO 04-03 23:57:34 [init.py:239] Automatically detected platform cuda.
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:01<00:00, 1.84s/it]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:01<00:00, 1.84s/it]
Capturing batches (avail_mem=7.53 GB): 100%|██████████████████████████████████████████████| 4/4 [00:02<00:00, 1.65it/s] [Fast-Spark-TTS] 2025-04-03 23:57:46 [INFO] [server:101] >> 加载角色库:data/roles [Fast-Spark-TTS] 2025-04-03 23:57:48 [INFO] [server:125] >> 角色库加载完毕,角色有:刘德华、殷夫人、徐志胜、李靖、赞助商、Donald Trump、哪吒、后羿、周杰伦、陈鲁豫、余承东 [Fast-Spark-TTS] 2025-04-03 23:57:48 [INFO] [server:131] >> Warming up... [Fast-Spark-TTS] 2025-04-03 23:57:52 [ERROR] [spark_engine:313] >> Semantic tokens 预测为空,prompt:<|task_controllable_tts|><|start_content|>测试音频。<|end_content|><|start_style_label|><|gender_0|><|pitch_label_2|><|speed_label_2|><|end_style_label|>,llm output: ERROR: Traceback (most recent call last): File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/starlette/routing.py", line 692, in lifespan async with self.lifespan_context(app) as maybe_state: ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/contextlib.py", line 210, in aenter return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/fastapi/routing.py", line 133, in merged_lifespan async with original_context(app) as maybe_original_state: ^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/contextlib.py", line 210, in aenter return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/Fast-Spark-TTS/server.py", line 384, in lifespan await warmup_engine(engine) File "/home/echo/Fast-Spark-TTS/server.py", line 132, in warmup_engine await async_engine.generate_voice_async( File "/home/echo/Fast-Spark-TTS/fast_tts/engine/spark_engine.py", line 775, in generate_voice_async first_output = await generate_audio(segments[0], acoustic_token=None) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/Fast-Spark-TTS/fast_tts/engine/spark_engine.py", line 749, in generate_audio generated = await self._generate_audio_tokens( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/Fast-Spark-TTS/fast_tts/engine/spark_engine.py", line 314, in _generate_audio_tokens raise ValueError(err_msg) ValueError: Semantic tokens 预测为空,prompt:<|task_controllable_tts|><|start_content|>测试音频。<|end_content|><|start_style_label|><|gender_0|><|pitch_label_2|><|speed_label_2|><|end_style_label|>,llm output:
ERROR: Application startup failed. Exiting.
显卡是2080ti
wsl安装vllm同样报错呢
(tts) echo@DESKTOP-07VNMO4:~/Fast-Spark-TTS$ python server.py --model_path Spark-TTS-0.5B --backend sglang --llm_device cuda --tokenizer_device cuda --detokenizer_device cuda --wav2vec_attn_implementation sdpa --llm_attn_implementation sdpa --torch_dtype "float32" --max_length 32768 --llm_gpu_memory_utilization 0.6 --host 0.0.0.0 --port 8000 [Fast-Spark-TTS] 2025-04-03 23:57:25 [INFO] [server:458] >> 启动 FastTTS 服务 [Fast-Spark-TTS] 2025-04-03 23:57:25 [INFO] [server:459] >> Config: Namespace(model_path='Spark-TTS-0.5B', backend='sglang', llm_device='cuda', tokenizer_device='cuda', detokenizer_device='cuda', wav2vec_attn_implementation='sdpa', llm_attn_implementation='sdpa', max_length=32768, llm_gpu_memory_utilization=0.6, torch_dtype='float32', cache_implementation=None, role_dir='data/roles', api_key=None, seed=0, batch_size=32, wait_timeout=0.01, host='0.0.0.0', port=8000) INFO: Started server process [17006] INFO: Waiting for application startup. /home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/torch/nn/utils/weight_norm.py:143: FutureWarning:
torch.nn.utils.weight_normis deprecated in favor oftorch.nn.utils.parametrizations.weight_norm. WeightNorm.apply(module, name, dim) INFO 04-03 23:57:29 [init.py:239] Automatically detected platform cuda. INFO 04-03 23:57:34 [init.py:239] Automatically detected platform cuda. INFO 04-03 23:57:34 [init.py:239] Automatically detected platform cuda. Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:01<00:00, 1.84s/it] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:01<00:00, 1.84s/it]Capturing batches (avail_mem=7.53 GB): 100%|██████████████████████████████████████████████| 4/4 [00:02<00:00, 1.65it/s] [Fast-Spark-TTS] 2025-04-03 23:57:46 [INFO] [server:101] >> 加载角色库:data/roles [Fast-Spark-TTS] 2025-04-03 23:57:48 [INFO] [server:125] >> 角色库加载完毕,角色有:刘德华、殷夫人、徐志胜、李靖、赞助商、Donald Trump、哪吒、后羿、周杰伦、陈鲁豫、余承东 [Fast-Spark-TTS] 2025-04-03 23:57:48 [INFO] [server:131] >> Warming up... [Fast-Spark-TTS] 2025-04-03 23:57:52 [ERROR] [spark_engine:313] >> Semantic tokens 预测为空,prompt:<|task_controllable_tts|><|start_content|>测试音频。<|end_content|><|start_style_label|><|gender_0|><|pitch_label_2|><|speed_label_2|><|end_style_label|>,llm output: ERROR: Traceback (most recent call last): File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/starlette/routing.py", line 692, in lifespan async with self.lifespan_context(app) as maybe_state: ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/contextlib.py", line 210, in aenter return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/site-packages/fastapi/routing.py", line 133, in merged_lifespan async with original_context(app) as maybe_original_state: ^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/anaconda3/envs/tts/lib/python3.12/contextlib.py", line 210, in aenter return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/Fast-Spark-TTS/server.py", line 384, in lifespan await warmup_engine(engine) File "/home/echo/Fast-Spark-TTS/server.py", line 132, in warmup_engine await async_engine.generate_voice_async( File "/home/echo/Fast-Spark-TTS/fast_tts/engine/spark_engine.py", line 775, in generate_voice_async first_output = await generate_audio(segments[0], acoustic_token=None) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/Fast-Spark-TTS/fast_tts/engine/spark_engine.py", line 749, in generate_audio generated = await self._generate_audio_tokens( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/echo/Fast-Spark-TTS/fast_tts/engine/spark_engine.py", line 314, in _generate_audio_tokens raise ValueError(err_msg) ValueError: Semantic tokens 预测为空,prompt:<|task_controllable_tts|><|start_content|>测试音频。<|end_content|><|start_style_label|><|gender_0|><|pitch_label_2|><|speed_label_2|><|end_style_label|>,llm output:
ERROR: Application startup failed. Exiting.
vllm重启后成功了,sglang仍然在warming up报错退出
sglang会存在这个错误,使用vllm 和torch不会
sglang会存在这个错误,使用vllm 和torch不会
是的,我改成vllm了,torch速度不行