dash-infer
dash-infer copied to clipboard
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including CUDA, x86 and ARMv9.
我通过 pip install 安装dashinfer后,执行推理报错
I would like to know what should it be the minimum requirement specifications for running 7B models. I have this configuration and only can run 1.5B with average throughput 8.1...
zsbd
如题
When executing pip install dashinfer, torch, pandas, tabulate and so on are not installed automatically since the `install_requires` in python/setup.py are not managed correctly.
在dashinfer集成进fastchat过程中,~~当prompt token超过engine_max_length时~~ 当.generation_config.max_length < prompt token < .engine_config.engine_max_length,程序恢复不了。
Here is an example of config file at examples/python/model_config/config_qwen_v10_7b.json ```json { "model_name": "Qwen-7B-Chat", "model_type": "Qwen_v10", "model_path": "~/dashinfer_models/", "data_type": "float32", "device_type": "CPU", "device_ids": [ 0 ], "multinode_mode": false, "engine_config": { "engine_max_length":...
