Kunshang Ji
Kunshang Ji
This is follow up for PR #1028.
### Anything you want to discuss about vllm. # Progress - [ ] Cmake and build System for Intel XPU/SYCL - [ ] vLLM custom op implementation in SYCL source...
**This PR is first PR for RFC #3725** _Intel is contributing both Intel CPU and Intel GPU support for vLLM. Initial PR #3634 for CPU already got merged._ This PR...
### What changes were proposed in this pull request? Update velox version ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No ### How was...
### What changes were proposed in this pull request? add gluten integration patch ### Why are the changes needed? for gluten integration ### Does this PR introduce _any_ user-facing change?...
Currently, when enable aggregate push down, there are such tpc-ds queries would fail: q9, q44, q54. Will figure out the root cause and fix.
TODO List: - [ ] TypeSignature Test - [x] BOOLEAN - [x] TINYINT - [x] SMALLINT - [x] INTEGER - [x] BIGINT - [x] REAL - [x] DOUBLE - [x]...
Ipex provide a `varlen_attention` API which could perform better on prompts computation compare to `torch.nn.functional.scaled_dot_product_attention`. This PR add such support. **BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN...
1. refactor gpu executors 2. remove repeated code in xpu executors 3. add multi process executor for xpu 4. cover #6013 5. add pipeline parallel support **BEFORE SUBMITTING, PLEASE READ...