Optimize and refine code to c++17
Overview
This pull request focuses on optimizing and modernizing the compute_mphf_seq part of the codebase. Initially, the goal was to replace std::string with std::string_view to improve performance and reduce unnecessary allocations. However, several other improvements were made to enhance the overall efficiency and maintainability of the code.
Changes Made
-
Replaced
std::stringwithstd::string_view:- Improved performance by avoiding unnecessary string copies.
- Reduced memory allocations.
-
Added
noexceptSpecifier:- Ensured that functions that do not throw exceptions are marked with
noexcept. - Potentially improved performance by allowing the compiler to make optimizations.
- Ensured that functions that do not throw exceptions are marked with
-
Utilized
constexpr:- Enabled compile-time evaluation for constants and functions.
- Improved code clarity and potential performance.
-
Optimized Memory Access Patterns:
- Ensured contiguous memory access to improve cache performance.
- Used
std::vector::reserveto avoid multiple reallocations.
-
Improved Logic and Readability:
- Simplified and clarified the logic in various parts of the code.
- Reduced unnecessary computations and redundant operations.
-
Modernized C++ Practices:
- Leveraged
std::movefor efficient resource management. - Used type aliases and consistent naming conventions for better readability.
- Leveraged
Test Results
The changes were tested using the synthetic data generation tool ./gen_synthetic_data with the command ./gen_synthetic_data test.dat 100000000. The results show an improvement in performance.
Before Optimization:
2024-05-29 17:15:37: Avg. 0.0121976 usecs per base hash computation
2024-05-29 17:15:37: Performing lookups
2024-05-29 17:20:42: Avg. 0.304692 usecs per lookup
avg_lookup_time 0.304692
stddev_lookup_time_percentage 7.33947
bits_per_key 2.61375
After Optimization:
2024-05-29 17:48:29: Avg. 0.00789278 usecs per base hash computation
2024-05-29 17:48:29: Performing lookups
2024-05-29 17:52:34: Avg. 0.245929 usecs per lookup
avg_lookup_time 0.245929
stddev_lookup_time_percentage 10.0789
bits_per_key 2.61375