fade_away issues

Results 58 issues of


                                            fade_away

support for -h raises an error

my code is like this. ``` @dataclass class Person: name: str = 'John' age: int = 18 if __name__ == '__main__': import sys print(sys.argv) #main() parser = argparse.ArgumentParser() parser.add_argument('--config_path', type=str)...

How to run this using python?

I build the model ok, but don't know how to run it using python. python tests/chat.py ？？？ how to config it? It runs fail.

feature request

Unknown conversation template: dolly

Hi all, I take a lot of effort to run this demo, but it crashes with this error, could anyone give some support ??? ``` ./build/mlc_chat_cli --model dolly-v2-3b Use MLC...

trouble shooting

Question about sampler. It takes too much time

I noticed that, the sampler stage uses lots of repeated cuda kernels. Seems you do sampling in a for loop, launch each kernel for a sequence? Why is this? BTW,...

Is pipeline parallelism supported?

I didn't see any documentation that mentions that.

[Docs] got an unexpected keyword argument 'enable_lora'

### 📚 The doc issue ```python from lmdeploy.messages import PytorchEngineConfig from lmdeploy.pytorch.engine.engine import Engine adapters = {'adapter0':'/root/.cache/huggingface/hub/models--tloen--alpaca-lora-7b/snapshots/12103d6baae1b320aa60631b38acb6ea094a0539/'} engine_config = PytorchEngineConfig(adapters=adapters) model_path = '/data/weilong.yu/lmdeploy/llama-7b' engine = Engine.from_pretrained(model_path, engine_config=engine_config, trust_remote_code=True) generator =...

Is nested dataclass supported?

``` from argparse_dataclass import dataclass from argparse_dataclass import ArgumentParser @dataclass class SubOption: a: int = 1 b: int = 2 @dataclass class Options: x: int = 42 y: bool =...

Is dict type supported?

I don't see dict case used in examples.

Triton performance varies at each run

I'm running the tutorial on A30, I set the repeat = 1000, and return time cost directly instead of the computation throughput. ![image](https://github.com/openai/triton/assets/26128514/b7abf0bb-52fc-41c2-986d-6a6c61cea80a) I find the triton performance varies a...

why the benchmark uses median data instead of avg data?

```python quantiles = [0.5, 0.2, 0.8] if provider == 'cublas': ms, min_ms, max_ms = triton.testing.do_bench(lambda: torch.matmul(a, b), quantiles=None) ``` I see the code here, it uses the median time cost...