Joel Lamy-Poirier
Joel Lamy-Poirier
With websockets 7.0 the connection to the browser is lost after some time, for me it is consistently after 20 seconds. The end result is unrelated error messages such as...
Checks always fails when using `float_to_top` and `add_imports` together (version 5.10.1). When running isort on my project, it is always "fixing" all files with no diff, and chacks always fail...
I did some extensive investigation, testing and benchmarking, and determined that the following is needed to speedup inference for the Bigcode models (and most of text-gen-inference models: 1. **Use `FlashAttention...
Prototype to greatly reduce the post-processing overhead at higher batch sizes.
Flash attn 2.5.7 always complains about the input data type even when it's clearly a correct one. I'm using the base image `nvcr.io/nvidia/pytorch:24.03-py3` ``` >>> import torch, flash_attn >>> from...
# ✨ Description A simplified version of #273, where resources are allocated statically for each workers. This works fine, with some big caveats: * Multi-gpu tests and spawned processes run...
# ✨ Description Adjust the rotary embeddings, peft and normalization layers to use the new dynamic classes. Do some cleanup and refactoring for rotary embeddings. Add an option to disable...
# ✨ Description Allows running tests in parallel and using all the available gpus so we can run lots of tests fast. Pytest-xdist is already relatively good, but puts everything...
# ✨ Description Fix: #126 Generalize the concept of dynamic config class from the dataset mechanism to all config class. I opted for a unique global registry of all config...
# ✨ Description This removes bloat and ad-hoc registries for the cli, and instead use a dynamic config class to get the exact same result in a much simpler way....