Eric Buehler

Results 136 issues of Eric Buehler

We should distinguish between 2 cases in `api_get_file!`: - 404: read from local - Anything else: propagate error Currently, if the "error" is not 404, we will still attempt reading...

This is currently pending on some way to do topk in Candle.

new feature
models

This will allow loading very large models onto the CPU and then applying ISQ onto the device.

new feature
backend
models

Please let us know what model architectures you would like to be added! **Up to date todo list below. Please feel free to contribute any model, a PR without device...

models

- [ ] RowParallelLinear - [ ] MergedColumnParallelLinear - [ ] QKVParallelLinear

paged-attention
backend

Refs and closes #215. # Api addition - DeviceMapper - All at-loading-time methods have `loading_isq` parameter - Add `fn set_nm_device, loading_isq: bool) -> VarBuilder

backend
models

# Description In this PR, I have added support for the [`mistral.rs`](https://github.com/EricLBuehler/mistral.rs) LLM inference platform via a new integration. `mistral.rs` is a new LLM inference platform with key features such...

size:XL

Argsort was just added to Candle (https://github.com/huggingface/candle/pull/2132). Using an argsort kernel will accelerate the current CPU sorting part of `topk` or `topp` sampling, which takes a lot of time.

optimization

Speculative decoding: https://arxiv.org/pdf/2211.17192 This will refactor the pipeline structure to make the sampling process more abstracted. Additionally, it will also abstract the scheduling and kv cache management. # Restriction -...

new feature
backend
models

Also enable logging for pyo3 bindings.