Maykeye comments

Results 31 comments of


                                            Maykeye

AttributeError: 'Namespace' object has no attribute 'gptq_bits'

After running some `git bisect`, it seems problems were introduced at commit 1566d8e34425937e6182d4425e7b85c370912d64 (Add model settings to the Models tab)

I have a question in the source code called modeling_llama.py

I had the same question [yesterday](https://discuss.huggingface.co/t/why-models-llama-in-particular-upcasts-softmax-to-fp32/44787). Can we make it optional? At least softmax BF16 is good enough. And by "good enough" I mean it "not crashes at long context...

I have a question in the source code called modeling_llama.py

Yes and quantized models produce noticeably different results.

Specifying incorrect type in custom node doens't report error

Custom types aren't being added as often as they are being used, so having diagnostic would be good. When I mistyped "FlOAT" after copying node that makes const int, I...

[1.2.4 on linux] Nothing happens when I try to run the model

``` INFO: Started server process [1310434] INFO: Waiting for application startup. torch found: /home/fella/src/sd/sd/lib/python3.11/site-packages/torch/lib torch set INFO: Application startup complete. INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit) max_tokens=4100...

[1.2.4 on linux] Nothing happens when I try to run the model

Runs fine after that

Question about activation function choice:

> Good idea! We haven't experimented with it. To be honest I think these differences tend to get washed out with scale, but maybe not. [Paper in point](https://arxiv.org/pdf/2310.04564.pdf). Switching activation...

Question about activation function choice:

> What is the meaning behind them being good places to inject adapters? Long story: arxiv:1902.00751. Short story: if LoRA replaces `XW` with `XW + XAB` and sees itself more...

Add documentation/tests on how to use inference_params to Mamba to generate sequences by parts

>Can you explain the use case here. Would this be like if the model is handling topic a, we're using and updating state a for each inference? Yes, manual cache...

Add documentation/tests on how to use inference_params to Mamba to generate sequences by parts

Similar. It's about the manual control over the every aspect of cache (and hence state) for model. [The model itself](https://github.com/state-spaces/mamba/blob/009bec5ee37f586844a3fc89c040a9c1a9d8badf/mamba_ssm/models/mixer_seq_simple.py#L233) uses InferenceParms.