fade_away
fade_away
my code is like this. ``` @dataclass class Person: name: str = 'John' age: int = 18 if __name__ == '__main__': import sys print(sys.argv) #main() parser = argparse.ArgumentParser() parser.add_argument('--config_path', type=str)...
I build the model ok, but don't know how to run it using python. python tests/chat.py ??? how to config it? It runs fail.
Hi all, I take a lot of effort to run this demo, but it crashes with this error, could anyone give some support ??? ``` ./build/mlc_chat_cli --model dolly-v2-3b Use MLC...
I noticed that, the sampler stage uses lots of repeated cuda kernels. Seems you do sampling in a for loop, launch each kernel for a sequence? Why is this? BTW,...
I didn't see any documentation that mentions that.
### 📚 The doc issue ```python from lmdeploy.messages import PytorchEngineConfig from lmdeploy.pytorch.engine.engine import Engine adapters = {'adapter0':'/root/.cache/huggingface/hub/models--tloen--alpaca-lora-7b/snapshots/12103d6baae1b320aa60631b38acb6ea094a0539/'} engine_config = PytorchEngineConfig(adapters=adapters) model_path = '/data/weilong.yu/lmdeploy/llama-7b' engine = Engine.from_pretrained(model_path, engine_config=engine_config, trust_remote_code=True) generator =...
``` from argparse_dataclass import dataclass from argparse_dataclass import ArgumentParser @dataclass class SubOption: a: int = 1 b: int = 2 @dataclass class Options: x: int = 42 y: bool =...
I don't see dict case used in examples.
I'm running the tutorial on A30, I set the repeat = 1000, and return time cost directly instead of the computation throughput.  I find the triton performance varies a...
```python quantiles = [0.5, 0.2, 0.8] if provider == 'cublas': ms, min_ms, max_ms = triton.testing.do_bench(lambda: torch.matmul(a, b), quantiles=None) ``` I see the code here, it uses the median time cost...