Chen Liang

Results 4 comments of Chen Liang

We haven't done the full benchmark yet. But following the script in vllm repo [https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_latency.py](url), you can do simple benchmark ```python import argparse import time from pathlib import Path from...

The idea is actually feasible. However, we have not yet tested whether our approach will cause the gpu to reach compute bound too fast, thereby affecting the overall throughput under...

Will release after test.

The example of CodeLlama can be found [here](https://github.com/alipay/PainlessInferenceAcceleration/blob/main/pia/lookahead/examples/codellama_example.py).