emphf icon indicating copy to clipboard operation
emphf copied to clipboard

Optimize and refine code to c++17

Open ad3002 opened this issue 1 year ago • 0 comments

Overview

This pull request focuses on optimizing and modernizing the compute_mphf_seq part of the codebase. Initially, the goal was to replace std::string with std::string_view to improve performance and reduce unnecessary allocations. However, several other improvements were made to enhance the overall efficiency and maintainability of the code.

Changes Made

  1. Replaced std::string with std::string_view:

    • Improved performance by avoiding unnecessary string copies.
    • Reduced memory allocations.
  2. Added noexcept Specifier:

    • Ensured that functions that do not throw exceptions are marked with noexcept.
    • Potentially improved performance by allowing the compiler to make optimizations.
  3. Utilized constexpr:

    • Enabled compile-time evaluation for constants and functions.
    • Improved code clarity and potential performance.
  4. Optimized Memory Access Patterns:

    • Ensured contiguous memory access to improve cache performance.
    • Used std::vector::reserve to avoid multiple reallocations.
  5. Improved Logic and Readability:

    • Simplified and clarified the logic in various parts of the code.
    • Reduced unnecessary computations and redundant operations.
  6. Modernized C++ Practices:

    • Leveraged std::move for efficient resource management.
    • Used type aliases and consistent naming conventions for better readability.

Test Results

The changes were tested using the synthetic data generation tool ./gen_synthetic_data with the command ./gen_synthetic_data test.dat 100000000. The results show an improvement in performance.

Before Optimization:

2024-05-29 17:15:37: Avg. 0.0121976 usecs per base hash computation
2024-05-29 17:15:37: Performing lookups
2024-05-29 17:20:42: Avg. 0.304692 usecs per lookup
avg_lookup_time 0.304692
stddev_lookup_time_percentage   7.33947
bits_per_key    2.61375

After Optimization:

2024-05-29 17:48:29: Avg. 0.00789278 usecs per base hash computation
2024-05-29 17:48:29: Performing lookups
2024-05-29 17:52:34: Avg. 0.245929 usecs per lookup
avg_lookup_time 0.245929
stddev_lookup_time_percentage   10.0789
bits_per_key    2.61375

ad3002 avatar May 29 '24 19:05 ad3002