ppl.llm.serving icon indicating copy to clipboard operation
ppl.llm.serving copied to clipboard

Results 10 ppl.llm.serving issues
Sort by recently updated
recently updated
newest added

## What are the problems?(screenshots or detailed error messages) In file included from /home/liuxiandong/workspace/ppl/ppl.llm.serving/src/models/llama/llama_worker.h:25:0, from /home/liuxiandong/workspace/ppl/ppl.llm.serving/src/models/llama/llama_worker.cc:18: /home/liuxiandong/workspace/ppl/ppl.llm.serving/src/models/llama/../../utils/mpsc_request_scheduler.h:21:10: fatal error: ppl/common/event_count.h: No such file or directory #include "ppl/common/event_count.h" ^~~~~~~~~~~~~~~~~~~~~~~~~~ In file...

## What are the problems?(screenshots or detailed error messages) 想问下有性能分析的工具嘛?profiler相关,还是只能用nsight profile这种自己去看一些算子性能 ## What are the types of GPU/CPU you are using? GPU:A100-80G-SXM4 ## What's the operating system ppl.llm.serving runs on?...

An error was encountered while executing client_qps_measure. Platform: llama-13B on 2 V100 GPUS ``` [INFO][2023-09-13 03:35:21.764][llama_server.cc:539] max_tokens: 75630 [INFO][2023-09-13 03:35:21.827][llama_server.cc:484] VOCAB_SIZE: 32000; BOS ID: 1; EOS ID: 2; PAD ID:...

enhancement

## What are the problems?(screenshots or detailed error messages) ## What are the types of GPU/CPU you are using? ## What's the operating system ppl.llm.serving runs on? ## What's the...

用 ppl.pmx Export 导出模型,有大量的警告, Warning: The shape interface of opmx::XX(如 ParallelEmbedding、ColumnParallelLinear、Reshape等) type is missing,用转出来的 onnx 格式的文件启动 ppl_llm_server,提示 unsupported op: domain[opmx], type[ParallelEmbedding]

## What are the problems?(screenshots or detailed error messages) I need to benchmark llama 2 7b time to first token(ttft) with openppl, and I have to benchmark it with static...

This PR will fix the compile error due to: ``` /home/xxx/workspace/ppl/ppl.llm.serving/src/models/llama/../../utils/mpsc_request_scheduler.h:21:10: fatal error: ppl/common/event_count.h: No such file or directory #include "ppl/common/event_count.h" ^~~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. ```

具体所指向文件为:https://github.com/OpenPPL/ppl.llm.serving/blob/master/src/utils/prefix_cache_manager.h 由于使用的是xxhash64,所以不具备抗碰撞性,攻击者可以构造一个Hash(a)=Hash(b)从而污染缓存。攻击者使用恶意提问A来污染缓存,用户使用B进行提问的时候会返回A的回答。 该漏洞的原理近似于VLLM(CVE-2025-25183),可以采用相似的方法来进行修复。

## What are the problems?(screenshots or detailed error messages) ## What are the types of GPU/CPU you are using? ## What's the operating system ppl.llm.serving runs on? ## What's the...

## What are the problems?(screenshots or detailed error messages) 使用offline_inference测试llama2_7b时,会报如下错误: ””“ [LLMCUDA][pmx/rms_norm_kernel.cc:84] |-DataFormat: NDARRAY [LLMCUDA][pmx/column_parallel_linear_kernel.cc:29] Entry LlmCudaKernel: [/layers.0/w [LLMCUDA][pmx/column_parallel_linear_kernel.cc:36] Input [input]: [LLMCUDA][pmx/column_parallel_linear_kernel.cc:37] TensorName: [/layers.0/attention [LLMCUDA][pmx/column_parallel_linear_kernel.cc:37] |-Data: 0x1120000000 [LLMCUDA][pmx/column_parallel_linear_kernel.cc:37] |-DimCount: 2...