Joel Lamy-Poirier comments

Results 47 comments of


                                            Joel Lamy-Poirier

Improve inference speed of Santacoder and Starcoder (and others)

Some benchmarking results, comparing several implementations: 1. `flash`: `flash_santacoder`, the current implementation. 2. `causal`: The `gpt_bigcode` model from HF transformers, run with `causal_lm`. 3. `vector`: The `gpt_bigcode` model from HF...

Improve inference speed of Santacoder and Starcoder (and others)

### Starcoder decode * Similar to Santacoder, but `flash` is already inefficient at a batch size of 1, often even worse than `causal`. * Latency for small batch sizes is...

Improve inference speed of Santacoder and Starcoder (and others)

> @jlamypoirier Thanks for great investigation. """Add support for cuda graphs, at least for decode. I already showed them to work with dynamic shapes (using a lot of graphs), and...

Improve inference speed of Santacoder and Starcoder (and others)

> @jlamypoirier Amazing reports !! May I ask does sequence length indicate max_new_token? I got pretty high latency (about 4s) for starcoder when I set max_new_token=128 It's the time to...

Discussion about dataset preparation speed

We'd have to find out where the time is being spent, I suspect tokenization. It's a Hugging Face thing so we don't have much control on it, but it seems...

Generalize dynamic config classes

> `fast-llm type=GPTTrainer` is principled (because it taps into the override logic) but ugly (because spelling out `type=` is mandatory and because it's using class names as values). I think...

Generalize dynamic config classes

@tscholak I started working on more dynamic classes and realized user-friendly names are essential. So I implemented a simple solution where each class can have its own registry, and the...

[feat] Support sparse copy for a large number of experts

This is a triton bug, our implementation of dropless mlp might not be able to handle that many experts. Fixing this will need an in-depth investigation and some implementation work....

Sandbox for Implementation of generate and integration of lm_eval (evaluation harness)

Can we please break down this PR? Otherwise it will make reviewing too difficult. Let's keep this one about the minimalistic `generate`, and move the rest to the next PR

Move Config Validations (e.g., Dataset Usage vs. Definitions) to `_validate` for Dry Run Checks

AFAIK all checks that can be done during validation are done there. But some of them can't really be done during validation because of missing information The most importantly category...