Tingqian Li
Tingqian Li
### Details: - Optimize weights memory usage by avoiding cloning weights with sub-normals when DAZ is set - Introduce a util function `is_denormals_as_zero_set()` - fix duplicate definitions issue by separate...
### Details: - in case of dummyShape produces in-compatible layout and weights is constant, reorder it only once and put it into weightSharing cache, to avoid reordering it in every...
### Details: original implementation use `auto itpd = descs[0].createPrimitiveDescriptorIterator(getEngine(), dnnl::primitive_attr());` to recreate a primitive descriptor from `descs[0]` for querying the actual memory descriptor used for current primitive, but this only...
# Description Add support for (src_f32, weight_f16, dst_f32) in inner product Fixes CVS-133453 # Checklist ## General - [ ] Do all unit and benchdnn tests (`make test` and `make...
# Description Cherry-pick oneDNN binary post ops optimizations into fork the constructor of `rhs_arg_static_params_t` now requires to allocate an additional general purpose register for address cache, the register can be...
### Details: - FC with symmetrically quantized/compressed weight may have i8 (instead of u8) as weight data type (it saves the zero-point subtraction cost), this change added support to such...
# Description Inner-product with symmetrically quantized/compressed weight may have s8 as weight data type (it saves the zero-point subtraction cost), this change added support to such weight dt. Fixes #...
### Details: - *item1* - *...* ### Tickets: - *CVS-146047* - *CVS-146312*
### Details: - to use AMX-INT8 to boost performance of QKV/MLP layers in LLM, we need dynamic per-token INT8 quantization according to many research papers (SmoothQuant for example) and reference...