FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

[WIP] Reduce peak memory to 8 GB

Open andy-yang-1 opened this issue 2 years ago • 0 comments

When we apply delta, we load two complete models at the same time, which puts a lot of strain on the CPU memory. This PR allows us to apply delta within 10GB of memory.

  • run with
    python3 -m fastchat.model.apply_delta \
      --base /path/to/llama-13b \
      --target /output/path/to/vicuna-13b \
      --delta lmsys/vicuna-13b-delta-v0
    

andy-yang-1 avatar Apr 13 '23 06:04 andy-yang-1