swiftLLM issues

Support for the Llama3.2-1B

## Basic Test: 1. offline mode: `python3 examples/offline.py --model-path ./models/Llama-3.2-1B` ![image](https://github.com/user-attachments/assets/30c8166b-9580-4f70-a36c-2baf7ed1eafb) 2. online mode: `python3 examples/online.py --model-path ./models/Llama-3.2-1B` ![image](https://github.com/user-attachments/assets/02e66a52-efb8-4586-963c-60ecadb66699) 3. api_server: `python3 swiftllm/server/api_server.py --model-path ./models/Llama-3.2-1B/ --host 0.0.0.0 --port 8082` ![image](https://github.com/user-attachments/assets/dba819bc-d3ae-4712-8e12-3bcd45505cd3)...

ambitiousCC

Question about performance

1

Could you please provide the relevant code for performance testing? Because during my testing process, the performance seems to be worse than that of vLLM.

SwibonX

Fail in running Llama-3.2-1B

5

Hello, I have my first try to run swiftLLM with Llama-3.2-1B, but get the following error: ![image](https://github.com/user-attachments/assets/f69bdd76-ed7f-4b55-a626-d2b756245d61) Python version 3.9.20 torch version 2.4.0

ycdfwzy

Question about parallelism

1

The current code doesn't seem to support parallelism and is there any plan to further support parallelism?

wsjdsg

Batch size capped at 100 requests

Even though --max-batch-size default value is 512, I could not get it to exceed 100. I ran it on gpu with much more vram also (From 6gb to 48gb), changed...

ASHISHAVHAD

peak memory calculation

3

I saw this line in source code: https://github.com/interestingLSY/swiftLLM/blob/682cf9a28f97f7490409981a2f181528f377eb5d/swiftllm/worker/model.py#L116-L122 after `forward`, some memory has been released, for example memory for Intermediate Activations and memory for input ids .etc so could the...

vincent-pli

swiftLLM
swiftLLM copied to clipboard

Metadata

Support for the Llama3.2-1B

Question about performance

Fail in running Llama-3.2-1B

Question about parallelism

Batch size capped at 100 requests

peak memory calculation

← Metadata

Owner

Metadata

swiftLLM swiftLLM copied to clipboard

Metadata

Support for the Llama3.2-1B

Question about performance

Fail in running Llama-3.2-1B

Question about parallelism

Batch size capped at 100 requests

peak memory calculation

← Metadata

Owner

Metadata

swiftLLM
swiftLLM copied to clipboard