Digant Desai
Digant Desai
Summary: * Works only with Simpleperf (cpu) for now, silently ignored by other profilers * The `extra_args` field defaults to an empty string, and goes to the simpleperf cmdline like...
Summary: - For ARM/ARM64, with Linux or Mach: set to true if the first core in the least capable cluster is a little core, for others: set to false (for...
# 4-bit Weight Blockwise Quantization for qd8-f[16,32] ## Introduction This proposal aims to explore the implementation of blockwise quantization for 4-bit weights for qd8-f16 and qd8-f32 in XNNPACK, a concept...
* Add a way to get the shape of the external value. Typically used to get dynamic shape of an external output value after runtime invocation. * Add a couple...
Summary: Make it work Differential Revision: D56287998
Summary: Method name update Differential Revision: D56072265
**Issue** - When a subgraph is created with a QD8 tensor, only the size for the actual tensor data is updated in the value. The space reservation or memory planning...
Test `packing-test --gtest_filter="PACK_QD8_F32_QB4W_GEMM_GOI_W.*"`
The goal of this issue is to monitor development progress for this rather large feature with multiple contributors involved. Additionally, it serves as a vehicle to make open questions, and...
before: fp_acc = 1/16 * (vksum * 16 + float(int_acc * 16) * scale) after: fp_acc = vksum + float(int_acc * 16) * scale / 16