Tingqian Li issues

Results 11 issues of


                                            Tingqian Li

Only clone weights when DAZ is not set

### Details: - Optimize weights memory usage by avoiding cloning weights with sub-normals when DAZ is set - Introduce a util function `is_denormals_as_zero_set()` - fix duplicate definitions issue by separate...

category: CPU

[CPU] Optimize conv weights reordering in dynamic shape

### Details: - in case of dummyShape produces in-compatible layout and weights is constant, reorder it only once and put it into weightSharing cache, to avoid reordering it in every...

category: CPU

Memory consumption optimization

category: Optimum plugin

[CPU] rnn: add pd to cache for preparing memory accordingly

### Details: original implementation use `auto itpd = descs[0].createPrimitiveDescriptorIterator(getEngine(), dnnl::primitive_attr());` to recreate a primitive descriptor from `descs[0]` for querying the actual memory descriptor used for current primitive, but this only...

category: CPU

support (src_f32,wei_fp16,dst_f32) in ip

# Description Add support for (src_f32, weight_f16, dst_f32) in inner product Fixes CVS-133453 # Checklist ## General - [ ] Do all unit and benchdnn tests (`make test` and `make...

Cherry-pick binary post optimization

# Description Cherry-pick oneDNN binary post ops optimizations into fork the constructor of `rhs_arg_static_params_t` now requires to allocate an additional general purpose register for address cache, the register can be...

[CPU] Support weight-compression dt s8

### Details: - FC with symmetrically quantized/compressed weight may have i8 (instead of u8) as weight data type (it saves the zero-point subtraction cost), this change added support to such...

category: CPU

Support weight-compressed date type s8

# Description Inner-product with symmetrically quantized/compressed weight may have s8 as weight data type (it saves the zero-point subtraction cost), this change added support to such weight dt. Fixes #...

[CPU] WA for s4 weight-compression inner-product low-perf on ICX

### Details: - *item1* - *...* ### Tickets: - *CVS-146047* - *CVS-146312*

category: CPU

[CPU] Add per-token asym INT8 dynamic quantization support to QKV/MLP node

### Details: - to use AMX-INT8 to boost performance of QKV/MLP layers in LLM, we need dynamic per-token INT8 quantization according to many research papers (SmoothQuant for example) and reference...

category: CPU

category: build