abacus-develop icon indicating copy to clipboard operation
abacus-develop copied to clipboard

Poor performance of zdot.

Open grysgreat opened this issue 1 year ago • 1 comments

Details

The performance of the Zdot function is too low compared to other vector operations (axpy, vecmul).

According to perf_math_kernel tests, several blas functions have the following performance results: img_v3_029n_4f1c5285-e9fe-49b4-a0af-af8790dc0efg

Look up the implementation of these functions in Abacus: find that the Zdot function is written entirely by using for loop. But the other blas functions are called by abacus' external blas library implementation. This may be the reason why zdot is relatively slow. QbE3e4IpcP

Task list for Issue attackers (only for developers)

  • [ ] Reproduce the performance issue on a similar system or environment.
  • [ ] Identify the specific section of the code causing the performance issue.
  • [ ] Investigate the issue and determine the root cause.
  • [ ] Research best practices and potential solutions for the identified performance issue.
  • [ ] Implement the chosen solution to address the performance issue.
  • [ ] Test the implemented solution to ensure it improves performance without introducing new issues.
  • [ ] Optimize the solution if necessary, considering trade-offs between performance and other factors (e.g., code complexity, readability, maintainability).
  • [ ] Review and incorporate any relevant feedback from users or developers.
  • [ ] Merge the improved solution into the main codebase and notify the issue reporter.

grysgreat avatar Apr 07 '24 08:04 grysgreat

@grysgreat do you have any plan to fix it by yourself?

WHUweiqingzhou avatar Aug 22 '24 06:08 WHUweiqingzhou

Image Image Image

Are you sure this zdot used in the benchmark is implemented by ourselves? I remember that this zdot is also directly using dot from cblas. If it is just using cblas we are actually unable to modify it to make it faster.

Critsium-xy avatar Oct 17 '24 15:10 Critsium-xy