abacus-develop Hard to reach convergence of some magnetic elements & abnormal slow of ABACUS `scf` calculations

Details

Please see below attached cases as described by the title:

Hard to converge within consideration long period (most are magnetic): Ni-hcp_uncov.zip Mn-bcc_uncov.zip Fe-fcc_uncov.zip Cr-bcc_uncov.zip Co-bcc_uncov.zip Ce-bcc-uncov.zip
Reach convergence but takes hours (very slow for single electronic step): Sm-rho_very_slow.zip V-fcc_very_slow.zip

INPUT:

INPUT_PARAMETERS
calculation scf
basis_type pw
symmetry 0
ecutwfc 100
scf_thr 1e-08
scf_nmax 200
cal_force 1
cal_stress 1
kspacing 0.08
pseudo_rcut 10
pseudo_mesh 1
ks_solver dav
relax_nmax 100
force_thr 0.001
stress_thr 0.5
smearing_method gaussian
smearing_sigma 0.01

machine_type: c64_m128_cpu_H

Task list for Issue attackers (only for developers)

[ ] Reproduce the performance issue on a similar system or environment.
[ ] Identify the specific section of the code causing the performance issue.
[ ] Investigate the issue and determine the root cause.
[ ] Research best practices and potential solutions for the identified performance issue.
[ ] Implement the chosen solution to address the performance issue.
[ ] Test the implemented solution to ensure it improves performance without introducing new issues.
[ ] Optimize the solution if necessary, considering trade-offs between performance and other factors (e.g., code complexity, readability, maintainability).
[ ] Review and incorporate any relevant feedback from users or developers.
[ ] Merge the improved solution into the main codebase and notify the issue reporter.

May 10 '24 11:05 ZLI-afk

@ZLI-afk,

According to ABACUS收敛性问题解决手册. I try to reduce mixing_beta from 0.8 to 0.4/0.2, and increase mixing_ndim from 8 to 15. You can check the results in link. Actually, I try 4 combinations, namely:

mixing_beta=0.4 and mixing_ndim=8
mixing_beta=0.4 and mixing_ndim=15
mixing_beta=0.2 and mixing_ndim=8
mixing_beta=0.2 and mixing_ndim=15

For Ni-hcp, converges in all 4 combinations:

For Mn-bcc, converges in all 4 combinations:

For Fe-fcc, converges in 3 combinations, only fails to converge for mixing_beta=0.4 and mixing_ndim=8:

For Cr-bcc, converges in all 4 combinations:

For Co-bcc, converges in all 4 combinations:

For Ce-bcc, converges for mixing_beta=0.2 and mixing_ndim=8

Actually, Ce-bcc is not hard converge. Instead, it is very easy to converge, you can see the drho: You can notice the drho decrease very fast to 1e-7, while fails to converge to 1e-8. These results indicate the Ce calculations is unstable numerically. This numerical instability might be caused by the pseudopotential. Furthermore, this instability also can lead to some numerical errors in the iterative solution methods (like Davidson method), but this is not a bug, rather it is a feature of this numerical solution technique. You can see more discussion in Issue #4068.

May 11 '24 02:05 WHUweiqingzhou

I have checked some examples calculated previous (ecutwfc is also 100 Ry).

example	natom	nbands	nelec	kpoints	bohrium_machine (parallel core)	cpu	ave scf_time
041_ZnMnGa	49	290	481	63	c32_m128_cpu(32)	Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz	305
043_RuSc	30	223	370	112	c32_m128_cpu(32)	Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz	530
055_ErAlNi	24	217	360	152	c32_m128_cpu(32)	Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz	278
V(this issue)	32	250	416	112	c64_m128_cpu_H(64)	AMD EPYC 7452 32-Core Processor	605
Sm(this issue)	24	159	264	172	c64_m128_cpu_H(64)	AMD EPYC 7452 32-Core Processor	1568

May 11 '24 03:05 pxlxingliang

I have check one example I calculated previous, and the ecutwfc is also 100 Ry.

example natom nbands nelec kpoints bohrium_machine (parallel core) cpu ave scf_time 041_ZnMnGa 49 290 481 63 c32_m128_cpu(32) Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz 305 043_RuSc 30 223 370 112 c32_m128_cpu(32) Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz 530 055_ErAlNi 24 217 360 152 c32_m128_cpu(32) Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz 278 V(this issue) 32 250 416 112 c64_m128_cpu_H(64) AMD EPYC 7452 32-Core Processor 605 Sm(this issue) 24 159 264 172 c64_m128_cpu_H(64) AMD EPYC 7452 32-Core Processor 1568

Is average scf_time too high for V and Sm? Is any way to solve this?

May 11 '24 04:05 ZLI-afk

Is average scf_time too high for V and Sm? Is any way to solve this?

Yes, it seems abnormal for these two examples. I suspect the performance of c64_m128_cpu_H(64) is not good. I will try to use c32_m128_cpu (paratera) to test them.

May 11 '24 05:05 pxlxingliang

I use c32_m128_cpu (paratera) to run example v with 32 cores parallel, the first 3 scf steps are:

ITER   ETOT(eV)       EDIFF(eV)      DRHO       TIME(s) 
 DA1    -6.424983e+04  0.000000e+00   2.174e+00  7.761e+02  
 DA2    -6.425507e+04  -5.239252e+00  2.121e+00  4.531e+02  
 DA3    -6.425613e+04  -1.065634e+00  8.683e+00  6.127e+02

While the results in this issue are:

 ITER   ETOT(eV)       EDIFF(eV)      DRHO       TIME(s)
 DA1    -6.424983e+04  0.000000e+00   2.174e+00  8.604e+02
 DA2    -6.425507e+04  -5.239372e+00  2.121e+00  5.164e+02
 DA3    -6.425613e+04  -1.063900e+00  8.689e+00  7.088e+02

As we can see, the performance of c32_m128_cpu (paratera) is better than c64_m128_cpu_H(64).

May 11 '24 05:05 pxlxingliang

Update the first 3 SCF steps of Sm on c32_m128_cpu :

 ITER   ETOT(eV)       EDIFF(eV)      DRHO       TIME(s)
 DA1    -2.616618e+04  0.000000e+00   7.877e-02  4.249e+03
 DA2    -2.616630e+04  -1.143728e-01  1.732e-02  1.742e+03
 DA3    -2.616623e+04  6.750572e-02   1.187e-01  1.925e+03

The results in this issue are:

 ITER   ETOT(eV)       EDIFF(eV)      DRHO       TIME(s)
 DA1    -2.616618e+04  0.000000e+00   7.877e-02  3.811e+03
 DA2    -2.616630e+04  -1.143728e-01  1.732e-02  1.372e+03
 DA3    -2.616623e+04  6.750572e-02   1.187e-01  1.624e+03

May 11 '24 06:05 pxlxingliang

I use c32_m128_cpu (paratera) to run example v with 32 cores parallel, the first 3 scf steps are:
ITER   ETOT(eV)       EDIFF(eV)      DRHO       TIME(s) 
 DA1    -6.424983e+04  0.000000e+00   2.174e+00  7.761e+02  
 DA2    -6.425507e+04  -5.239252e+00  2.121e+00  4.531e+02  
 DA3    -6.425613e+04  -1.065634e+00  8.683e+00  6.127e+02  
While the results in this issue are:
 ITER   ETOT(eV)       EDIFF(eV)      DRHO       TIME(s)
 DA1    -6.424983e+04  0.000000e+00   2.174e+00  8.604e+02
 DA2    -6.425507e+04  -5.239372e+00  2.121e+00  5.164e+02
 DA3    -6.425613e+04  -1.063900e+00  8.689e+00  7.088e+02
As we can see, the performance of c32_m128_cpu (paratera) is better than c64_m128_cpu_H(64).

Please see the latest V case which fails to finished the scf calculation due to KILLED BY SIGNAL: 6 (Aborted). Maybe this imply something V_failed_signal_6.zip

May 11 '24 06:05 ZLI-afk

Please see the latest V case which fails to finished the scf calculation due to KILLED BY SIGNAL: 6 (Aborted). Maybe this imply something V_failed_signal_6.zip

The error of this test is related to the SchmitOrth in davidson:

abacus: /abacus-develop/source/module_hsolver/diago_david.cpp:947: void hsolver::DiagoDavid<>::SchmitOrth(const int &, const int, const int, psi::Psi<T, Device> &, const T *, T *, const int, const int) [T = std::complex<double>, Device = psi::DEVICE_CPU]: Assertion `psi_norm > 0.0' failed.
abacus: /abacus-develop/source/module_hsolver/diago_david.cpp:947: void hsolver::DiagoDavid<>::SchmitOrth(const int &, const int, const int, psi::Psi<T, Device> &, const T *, T *, const int, const int) [T = std::complex<double>, Device = psi::DEVICE_CPU]: Assertion `psi_norm > 0.0' failed.
abacus: /abacus-develop/source/module_hsolver/diago_david.cpp:947: void hsolver::DiagoDavid<>::SchmitOrth(const int &, const int, const int, psi::Psi<T, Device> &, const T *, T *, const int, const int) [T = std::complex<double>, Device = psi::DEVICE_CPU]: Assertion `psi_norm > 0.0' failed.
abacus: /abacus-develop/source/module_hsolver/diago_david.cpp:947: void hsolver::DiagoDavid<>::SchmitOrth(const int &, const int, const int, psi::Psi<T, Device> &, const T *, T *, const int, const int) [T = std::complex<double>, Device = psi::DEVICE_CPU]: Assertion `psi_norm > 0.0' failed.

Usually, it is because of the numerical instability.

May 11 '24 07:05 pxlxingliang