Andrey36652 comments

Results 12 comments of


                                            Andrey36652

Investigate alternative approach for Q4 quantization

@ggerganov "so for start we can do it just on the model tensors. The intermediate tensors during the evaluation can remain quantized using the existing approach, so that the evaluation...

Investigate alternative approach for Q4 quantization

Might worth reading A Survey of Quantization Methods for Efficient Neural Network Inference https://arxiv.org/pdf/2103.13630.pdf

Investigate alternative approach for Q4 quantization

> I came up with a script that's able to compute RMS for various quantization methods - maybe it will come handy for experimenting: https://gist.github.com/prusnak/f54f8f33503458ca1aa9883f71897072 @prusnak Do you know, which...

Combine large LLM with small LLM for faster inference

@ggerganov Relevant paper https://arxiv.org/pdf/2303.08112.pdf ![image](https://user-images.githubusercontent.com/35865938/228995969-df144fdc-d02e-41ea-9569-fdd3ea8d93eb.png)

[BUG][1.5.1] The AI seems to be a lot dumber than it was before

May be related https://github.com/Cohee1207/SillyTavern/issues/317

Releasing model?

@internlm-team You told us number of parameters at least :)

Grafana show context support

I'm +1 on this feature. I have very little knowledge about plugin development for grafana. How hard it will be to parse existing user query and inject stream context pipe...

Gzip decoder returns too early while consuming TCP fragmented chuncked body

Seems like I've found out a reason. `async-compression` **by default** will report stream as "Done" as soon as GzipDecoder has finished it's work. https://github.com/Nullus157/async-compression/blob/9880aa99b264825e5c6902b2e25132e6bf2f74b9/src/tokio/bufread/generic/decoder.rs#L91 But one can change this behaviour...

DECIMAL documentation and implementation disagree

https://crates.io/crates/bigdecimal should be slower (significantly?) as it deals with arbitrary precision decimals. It uses i64 to store scale and BigInt (from num-bigint crate) to store value. BigInt uses BigUint which...

Panic during helmfile init on windows

@yxxhero When I execute `helm plugin install https://github.com/databus23/helm-diff` I encounter the following error: ``` Error: exec: "sh": executable file not found in %PATH% ``` A `helm-diff` folder is created in...