Yi-sir issues

Results 6 issues of


                                            Yi-sir

A question about returns of topk in class Gate

```python # inference/model.py +590 indices = torch.topk(scores, self.topk, dim=-1)[1] weights = original_scores.gather(1, indices) ``` Is these lines equal to codes below? ```python weights, indices = torch.topk(scores, self.topk, dim=-1) ``` It...

[Bug] CudaGraph capture failed while initializing DeepSeek-R1 with docker image lmsysorg/sglang:deepep

### Checklist - [x] 1. I have searched related issues but cannot get the expected help. - [x] 2. The bug has not been fixed in the latest version. -...

Add TPU backend for general_perf and llm_perf

1. 提交general_perf TPU后端，包含编译、运行时及README。除TPU后端外，还包括如下改动： 1.1 general_perf/backends/CPU/calculate_cpu_diff.sh 判断后端是否为TPU，若是则跳过venv 1.2 general_perf/launch.py 调用上述sh时增加传参hardware_type 1.3 general_perf/launch.py 当compile_only为true时，修复传参错误导致解析workload失败的问题 1.4 general_perf/core/perf_engine.py 修改了更安全的获取cpu型号代码。旧代码在某些情况下获取失败时会导致程序崩溃 1.5 general_perf/backends/CPU/ 后端文件将import torch提到import tensorflow之前，因为在sophgo/tpuc_dev中按照旧顺序import时进程会卡死 1.6 general_perf/backends/runtime_backend.py 修复了load()方法的注释目前只测试了fp32 1b的resnet50和yolov5，后续会补充测试内容及完善README。 2. 提交了llm_perf的TPU后端，包含调度部分，暂未包含modeling_xxx.py。模型部分近期将提交。

关于Llama model split的疑问

阅读代码时，看到分布式推理需要从模型路径下的/world_size/rank目录读取权重。看到了在split_llama.py里有按照worldsize和rank进行拆分的逻辑，但没找到在哪里调用了split_llama.py的文件。请教下这部分是如何运行起来的？

[Bug]: Failed to get segment descriptor for segment localhost:15465 While running example code in docs/source/getting_started /quick-start.md

### Bug Report When I ran example codes from https://github.com/kvcache-ai/Mooncake/blob/c31bb270aeda4e32cb7a319baebaee8b046cdd9a/docs/source/getting_started/quick-start.md?plain=1#L23 , I got error messages like: ``` Waiting for server buffer information... Received server info - Session ID: localhost:15465 Server...

bug

Yi-sir

A question about returns of topk in class Gate

[Bug] CudaGraph capture failed while initializing DeepSeek-R1 with docker image lmsysorg/sglang:deepep

Add TPU backend for general_perf and llm_perf

关于Llama model split的疑问

[Bug]: Failed to get segment descriptor for segment localhost:15465 While running example code in docs/source/getting_started /quick-start.md

fix ref_count in RadixCacheManager