Chen Liang comments

Repositories
Issues
Comments

Results 4 comments of


                                            Chen Liang

How the performance VS vLLM inference（vLLM vs Lookahead）

We haven't done the full benchmark yet. But following the script in vllm repo [https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_latency.py](url), you can do simple benchmark ```python import argparse import time from pathlib import Path from...

How the performance VS vLLM inference（vLLM vs Lookahead）

The idea is actually feasible. However, we have not yet tested whether our approach will cause the gpu to reach compute bound too fast, thereby affecting the overall throughput under...

Consider Support CodeLlama?

Will release after test.

Consider Support CodeLlama?

The example of CodeLlama can be found [here](https://github.com/alipay/PainlessInferenceAcceleration/blob/main/pia/lookahead/examples/codellama_example.py).