Andrey36652
Andrey36652
@ggerganov "so for start we can do it just on the model tensors. The intermediate tensors during the evaluation can remain quantized using the existing approach, so that the evaluation...
Might worth reading A Survey of Quantization Methods for Efficient Neural Network Inference https://arxiv.org/pdf/2103.13630.pdf
> I came up with a script that's able to compute RMS for various quantization methods - maybe it will come handy for experimenting: https://gist.github.com/prusnak/f54f8f33503458ca1aa9883f71897072 @prusnak Do you know, which...
@ggerganov Relevant paper https://arxiv.org/pdf/2303.08112.pdf 
May be related https://github.com/Cohee1207/SillyTavern/issues/317
@internlm-team You told us number of parameters at least :)
I'm +1 on this feature. I have very little knowledge about plugin development for grafana. How hard it will be to parse existing user query and inject stream context pipe...
Seems like I've found out a reason. `async-compression` **by default** will report stream as "Done" as soon as GzipDecoder has finished it's work. https://github.com/Nullus157/async-compression/blob/9880aa99b264825e5c6902b2e25132e6bf2f74b9/src/tokio/bufread/generic/decoder.rs#L91 But one can change this behaviour...
https://crates.io/crates/bigdecimal should be slower (significantly?) as it deals with arbitrary precision decimals. It uses i64 to store scale and BigInt (from num-bigint crate) to store value. BigInt uses BigUint which...
@yxxhero When I execute `helm plugin install https://github.com/databus23/helm-diff` I encounter the following error: ``` Error: exec: "sh": executable file not found in %PATH% ``` A `helm-diff` folder is created in...