[CPU] Optimize some kernels from CPU backend
Summary
- Optimize the Insert/Extract/Transpose kernels from the CPU backend by removing the address arithmetic performed at run-time with a simple access pattern based on offsets only, generated at compile-time by a tensor utility class named
TensorAccessPattern. - Small optimizations regarding pointer arithmetic for other kernels: e.g. resize, softmax.
Note: The usage of the macro definitions like libjit_getXYZW should be prohibited since it results in poor performance. Instead the kernels should be improved by avoiding at all possible the usage of these macros, exploiting more linear access patterns or pre-computed offsets at compile-time.
Test Plan
Current unit tests are passing.
Added extra unit tests in TensorsTest.cpp for the new utilities.
@opti-mix Can you take a look on this? Thanks!
@jackm321 Do you have time to review this PR? Thanks!
@yinghai Could you take a look on this? Thanks!
@mciprian13 Could you report how much perf win you observe using the approach in this PR?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.
Ping.
@mciprian13 Did you address comments from @opti-mix here?
@opti-mix @jfix71 I still need to provide some performance numbers to prove there is an optimization. To be noted that we are interested mainly in microcontrollers so the performance might be biased towards these architectures. Will come back with some numbers to provide the requested info.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.
Ping.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.
Ping.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.
ping