CoderPanda comments

Repositories
Issues
Comments

Results 3 comments of


                                            CoderPanda

Add support for CUDA streams.

@lindstro was this something that got a place in this release (1.0.0; release notes does not mention so)? If not, is this in works for the release later this year?

Merging changes of feature/hip-support (and feature/parallel-variable-rate)

Thanks @lindstro for the update. Is there also a tentative timeline for the release 0.5.6 (which it seems likely to be coming soon 🍾 )?

SIGABRT - Fatal Python error: Aborted when running vllm on llama2-7b with --tensor-parallel-size 2

Getting the same error while trying to serve Mixtral-8x7B-Instruct-v0.1 on vLLM 0.2.6 with `--tensor_parallel_size 2` Cuda 12.2 Nvidia Driver 535.104.12 A100 host with 8 GPU cards python 3.11.5 vLLM 0.2.6