abacus-develop icon indicating copy to clipboard operation
abacus-develop copied to clipboard

ABACUS HSE is much slower than QE (Useful information about how to use LibRI with ABACUS)

Open iduygnay opened this issue 1 year ago • 13 comments

Details

The same structure and same accuracy input, this is the time that qe spent: image This is ABACUS with exx_separate_loop = 0, if I comment this line, the calculation will be interrupted because of oom: image

The ABACUS version is 3.7.0, this is the runscript: image

Task list for Issue attackers (only for developers)

  • [ ] Reproduce the performance issue on a similar system or environment.
  • [ ] Identify the specific section of the code causing the performance issue.
  • [ ] Investigate the issue and determine the root cause.
  • [ ] Research best practices and potential solutions for the identified performance issue.
  • [ ] Implement the chosen solution to address the performance issue.
  • [ ] Test the implemented solution to ensure it improves performance without introducing new issues.
  • [ ] Optimize the solution if necessary, considering trade-offs between performance and other factors (e.g., code complexity, readability, maintainability).
  • [ ] Review and incorporate any relevant feedback from users or developers.
  • [ ] Merge the improved solution into the main codebase and notify the issue reporter.

iduygnay avatar Aug 23 '24 09:08 iduygnay

@iduygnay Can you try exx_separate_loop=1 @PeizeLin Any comments?

QuantumMisaka avatar Aug 24 '24 10:08 QuantumMisaka

exx_separate_loop=1 will be oom, and it is also slower than qe.

iduygnay avatar Aug 24 '24 10:08 iduygnay

@iduygnay I notice that you're using ABACUS 3.7.0, can you update to the newest ABACUS with LibRI 0.2.0 ?

QuantumMisaka avatar Aug 25 '24 05:08 QuantumMisaka

@iduygnay LibRI-v0.2.0 has improved efficiency many times compared to LibRI-v0.1.1. And ABACUS-v3.7.4 begins to support LibRI-v0.2.0 officially. So it's strongly recommended to update both of the codes.

PeizeLin avatar Sep 11 '24 05:09 PeizeLin

LibRI-0.2.0 exactly has a higher efficiency than 0.1.1, but is still slower than QE. (the same calculation, QE completed in 7h) image

iduygnay avatar Sep 11 '24 09:09 iduygnay

@iduygnay What's your dependencies of ABACUS and QE? And What's parallel setting do you use (the parallel number of MPI and OpenMP) in HSE of ABACUS and QE?

Also, What's your ABACUS INPUT ? (I've no idea about QE orz)

QuantumMisaka avatar Sep 11 '24 09:09 QuantumMisaka

ABACUS Runscript image ABACUS INPUT (I also tried separate_loop=1, but not converge and had almost the same runtime as the case without that line) image QE input image QE Runscript image

iduygnay avatar Sep 11 '24 09:09 iduygnay

@iduygnay

  • mixing_method and mixing_beta is recommended to use the default (broyden, and sigma is relate to your bandgap of system)
  • more OpenMP threads is recommend for HSE in ABACUS, I recommend the range of 16 - 32

@PeizeLin Other suggestions ?

QuantumMisaka avatar Sep 11 '24 10:09 QuantumMisaka

Multi-threading has a significant impact on memory and speed for exx

export OMP_NUM_THREADS=64
mpirun -np 1 abacus

Slightly reduce the accuracy

exx_dm_threshold  1E-3
exx_ccp_rmesh_times  1

Using double in exx instead of complex will significantly improve speed, but I'm not sure whether it will cause symmetry errors for your STRU. You should compare the results after calculation.

exx_real_number  1

PeizeLin avatar Sep 11 '24 16:09 PeizeLin

I've tried export OMP_NUM_THREADS=64 mpirun -np 1 abacus the speed can be increased and the runtime is about 24 hours now. But 6.5 hours for QE. Image I can also try the other parameters soon.

iduygnay avatar Oct 24 '24 09:10 iduygnay

I tried exx_ccp_rmesh_times 1 exx_real_number 1 Now, it takes 20 hours.

iduygnay avatar Oct 25 '24 06:10 iduygnay

hse.zip Attached files are the input for qe and abacus.

iduygnay avatar Jan 09 '25 03:01 iduygnay

hse.zip Attached files are the input for qe and abacus.

symmetry=1 is usefule for such small systems.

linpeize avatar Sep 26 '25 08:09 linpeize