dash-infer icon indicating copy to clipboard operation
dash-infer copied to clipboard

DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including CUDA, x86 and ARMv9.

Results 18 dash-infer issues
Sort by recently updated
recently updated
newest added

我通过 pip install 安装dashinfer后,执行推理报错

I would like to know what should it be the minimum requirement specifications for running 7B models. I have this configuration and only can run 1.5B with average throughput 8.1...

When executing pip install dashinfer, torch, pandas, tabulate and so on are not installed automatically since the `install_requires` in python/setup.py are not managed correctly.

在dashinfer集成进fastchat过程中,~~当prompt token超过engine_max_length时~~ 当.generation_config.max_length < prompt token < .engine_config.engine_max_length,程序恢复不了。

Here is an example of config file at examples/python/model_config/config_qwen_v10_7b.json ```json { "model_name": "Qwen-7B-Chat", "model_type": "Qwen_v10", "model_path": "~/dashinfer_models/", "data_type": "float32", "device_type": "CPU", "device_ids": [ 0 ], "multinode_mode": false, "engine_config": { "engine_max_length":...

![image](https://github.com/modelscope/dash-infer/assets/71427899/f23cd445-651a-4dc0-9c3f-c8142b910a4b)