FlexGen
FlexGen copied to clipboard
Running large language models on a single GPU for throughput-oriented scenarios.
Hi! I'm trying to reproduce FlexGen results and compare with more naive methods and i'm getting weird results. Can you please help me? __edit:__ added benchmark details and a [minimalistic...
I’m on a system hardlimited to 40GB of cpu ram + swap. When I try to load opt-30b the process is killed from memory exhaustion. If I load the model...
Are multiple line answers in the chatbot cut off? It seems like it "has more to say" sometimes, but the output is trimmed to just the first line. For example...
On Windows at least, it seems to be path is not obeyed and kept downloading into .cache directory of the c:\ file system (which I don't have enough space.) I've...
Awesome work! Any plans on having this as a strategy plugin for pytorch-lightning? (like DDP/DeepSpeed/ColossalAI) (https://pytorch-lightning.readthedocs.io/en/stable/advanced/model_parallel.html)
Hello! I got an error with running: python -m flexgen.flex_opt --model facebook/opt-30b --percent 0 100 100 0 100 0 ``` warmup - init weights Traceback (most recent call last): File...
Hello! I propose to add support for the Erebus family of models, these are finetune models of the original OPT. I looked at the code, and the support is not...
Think about what Automatic1111 did to Stable Diffusion, from a rather brute one-shot image generator significantly worse than the commercial counterparts it is now a distribution with thousands of features,...
Hi, I'm trying to run the benchmark `bench_30b_1x4.sh` (except that I set `N_GPUS=2`), but I get the following python exception: ``` rank #1: TypeError: sequence item 6: expected str instance,...
When using offloading in flex_opt I get a PermissionError on windows. This line throws the error: https://github.com/FMInference/FlexGen/blob/main/flexgen/pytorch_backend.py#L664 ``` os.remove(tensor.data) ``` It happens, because the filepath `tensor.data` is still open as...