Shanshan Shen

Results 2 issues of Shanshan Shen

# What does this PR do? Add `get_stream_cls()` method for `Platform` class, making the code related to Stream creating decoupled with specific hardware, e.g., `cuda`.

speculative-decoding

# What does this PR do? When running vLLM with `MiniCPM` model (not moe), it will get an error shown that `triton` is not supported on some devices. Thus, I...