FastChat
FastChat copied to clipboard
[WIP] Reduce peak memory to 8 GB
When we apply delta, we load two complete models at the same time, which puts a lot of strain on the CPU memory. This PR allows us to apply delta within 10GB of memory.
- run with
python3 -m fastchat.model.apply_delta \ --base /path/to/llama-13b \ --target /output/path/to/vicuna-13b \ --delta lmsys/vicuna-13b-delta-v0