bopeng1234
bopeng1234
Hi, we are running benchmark on 3b models, using all-in-one script, models are [phi3-4k](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct), [phi3-128k](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) and [starcoder2-3b](https://huggingface.co/bigcode/starcoder2-3b). Environments: ThinkBookX2024 Ultra9-185H Windows11pro-23H2 32GB LPDDR5x8400MHz Arc Driver: 31.0.101.5445 ipex-llm[xpu] 2.1.0b20240506 Questions here:...
In Ubuntu 22.04, i5-1135G7 with iGPU enabled (oneAPI+compute runtime+level zero loader) kernel 6.5.0-35-generic (follow this [link](https://dgpu-docs.intel.com/driver/client/overview.html#install-out-of-tree-driver) to install) Run the `oneapi matrix_mult` sample in one terminal ``` oneAPI-samples/DirectProgramming/C++SYCL/DenseLinearAlgebra/matrix_mul$ ./matrix_mul_dpc Device:...
follow current implementation for ScaledDotProductAttentionDecomposition, https://github.com/openvinotoolkit/openvino/blob/master/src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp#L221 add GroupQueryAttention OP decomposition logic for NPUW llm_compiled_model.
add basic implementation for RotaryEmbedding op, https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.RotaryEmbedding
current implementation will failed when stash_type not match X's element type, for example stash type is fp32, X is fp16
# Existing Sample Changes ## Description The [UV tool](https://github.com/astral-sh/uv) is designed to streamline the management of Python environments for multiple test cases. One of its standout features is its ability...
Move the ConvertWeightCompressedConv1x1ToMatmul pattern and its test from the intel_gpu plugin to the common transformation folder. The purpose is to reuse it on both the CPU and GPU sides. ###...