Yi-sir
Yi-sir
```python # inference/model.py +590 indices = torch.topk(scores, self.topk, dim=-1)[1] weights = original_scores.gather(1, indices) ``` Is these lines equal to codes below? ```python weights, indices = torch.topk(scores, self.topk, dim=-1) ``` It...
### Checklist - [x] 1. I have searched related issues but cannot get the expected help. - [x] 2. The bug has not been fixed in the latest version. -...
1. 提交general_perf TPU后端,包含编译、运行时及README。除TPU后端外,还包括如下改动: 1.1 general_perf/backends/CPU/calculate_cpu_diff.sh 判断后端是否为TPU,若是则跳过venv 1.2 general_perf/launch.py 调用上述sh时增加传参hardware_type 1.3 general_perf/launch.py 当compile_only为true时,修复传参错误导致解析workload失败的问题 1.4 general_perf/core/perf_engine.py 修改了更安全的获取cpu型号代码。旧代码在某些情况下获取失败时会导致程序崩溃 1.5 general_perf/backends/CPU/ 后端文件将import torch提到import tensorflow之前,因为在sophgo/tpuc_dev中按照旧顺序import时进程会卡死 1.6 general_perf/backends/runtime_backend.py 修复了load()方法的注释 目前只测试了fp32 1b的resnet50和yolov5,后续会补充测试内容及完善README。 2. 提交了llm_perf的TPU后端,包含调度部分,暂未包含modeling_xxx.py。模型部分近期将提交。
阅读代码时,看到分布式推理需要从模型路径下的/world_size/rank目录读取权重。看到了在split_llama.py里有按照worldsize和rank进行拆分的逻辑,但没找到在哪里调用了split_llama.py的文件。请教下这部分是如何运行起来的?
### Bug Report When I ran example codes from https://github.com/kvcache-ai/Mooncake/blob/c31bb270aeda4e32cb7a319baebaee8b046cdd9a/docs/source/getting_started/quick-start.md?plain=1#L23 , I got error messages like: ``` Waiting for server buffer information... Received server info - Session ID: localhost:15465 Server...