Eric Buehler issues

Results 136 issues of


                                            Eric Buehler

Better error when token not provided for gated models

We should distinguish between 2 cases in `api_get_file!`: - 404: read from local - Anything else: propagate error Currently, if the "error" is not 404, we will still attempt reading...

Add topk scalings, topk softmax scalings for X-LoRA

This is currently pending on some way to do topk in Candle.

new feature

models

Intermediate loading of ISQ models on CPU

This will allow loading very large models onto the CPU and then applying ISQ onto the device.

new feature

backend

models

Model Wishlist

151

Please let us know what model architectures you would like to be added! **Up to date todo list below. Please feel free to contribute any model, a PR without device...

models

Need parallel linears

- [ ] RowParallelLinear - [ ] MergedColumnParallelLinear - [ ] QKVParallelLinear

paged-attention

backend

Implement intermediate loading for ISQ on CPU

Refs and closes #215. # Api addition - DeviceMapper - All at-loading-time methods have `loading_isq` parameter - Add `fn set_nm_device, loading_isq: bool) -> VarBuilder

backend

models

backend

models

Source bos, eos tokens from generation_config.json

Also enable logging for pyo3 bindings.

Eric Buehler

Better error when token not provided for gated models

Add topk scalings, topk softmax scalings for X-LoRA

Intermediate loading of ISQ models on CPU

Model Wishlist

Need parallel linears

Implement intermediate loading for ISQ on CPU

Integrate mistral.rs LLM

Accelerate topk, topp sampling with `argsort`

Implement Speculative Decoding

Source bos, eos tokens from generation_config.json