Hard to reach convergence of some magnetic elements & abnormal slow of ABACUS `scf` calculations
Details
Please see below attached cases as described by the title:
-
Hard to converge within consideration long period (most are magnetic): Ni-hcp_uncov.zip Mn-bcc_uncov.zip Fe-fcc_uncov.zip Cr-bcc_uncov.zip Co-bcc_uncov.zip Ce-bcc-uncov.zip
-
Reach convergence but takes hours (very slow for single electronic step): Sm-rho_very_slow.zip V-fcc_very_slow.zip
INPUT:
INPUT_PARAMETERS
calculation scf
basis_type pw
symmetry 0
ecutwfc 100
scf_thr 1e-08
scf_nmax 200
cal_force 1
cal_stress 1
kspacing 0.08
pseudo_rcut 10
pseudo_mesh 1
ks_solver dav
relax_nmax 100
force_thr 0.001
stress_thr 0.5
smearing_method gaussian
smearing_sigma 0.01
machine_type:
c64_m128_cpu_H
Task list for Issue attackers (only for developers)
- [ ] Reproduce the performance issue on a similar system or environment.
- [ ] Identify the specific section of the code causing the performance issue.
- [ ] Investigate the issue and determine the root cause.
- [ ] Research best practices and potential solutions for the identified performance issue.
- [ ] Implement the chosen solution to address the performance issue.
- [ ] Test the implemented solution to ensure it improves performance without introducing new issues.
- [ ] Optimize the solution if necessary, considering trade-offs between performance and other factors (e.g., code complexity, readability, maintainability).
- [ ] Review and incorporate any relevant feedback from users or developers.
- [ ] Merge the improved solution into the main codebase and notify the issue reporter.
@ZLI-afk,
According to ABACUS收敛性问题解决手册. I try to reduce mixing_beta from 0.8 to 0.4/0.2, and increase mixing_ndim from 8 to 15. You can check the results in link.
Actually, I try 4 combinations, namely:
-
mixing_beta=0.4andmixing_ndim=8 -
mixing_beta=0.4andmixing_ndim=15 -
mixing_beta=0.2andmixing_ndim=8 -
mixing_beta=0.2andmixing_ndim=15
For Ni-hcp, converges in all 4 combinations:
For Mn-bcc, converges in all 4 combinations:
For Fe-fcc, converges in 3 combinations, only fails to converge for mixing_beta=0.4 and mixing_ndim=8:
For Cr-bcc, converges in all 4 combinations:
For Co-bcc, converges in all 4 combinations:
For Ce-bcc, converges for mixing_beta=0.2 and mixing_ndim=8
Actually, Ce-bcc is not hard converge. Instead, it is very easy to converge, you can see the drho:
You can notice the drho decrease very fast to
1e-7, while fails to converge to 1e-8. These results indicate the Ce calculations is unstable numerically. This numerical instability might be caused by the pseudopotential. Furthermore, this instability also can lead to some numerical errors in the iterative solution methods (like Davidson method), but this is not a bug, rather it is a feature of this numerical solution technique. You can see more discussion in Issue #4068.
I have checked some examples calculated previous (ecutwfc is also 100 Ry).
| example | natom | nbands | nelec | kpoints | bohrium_machine (parallel core) | cpu | ave scf_time |
|---|---|---|---|---|---|---|---|
| 041_ZnMnGa | 49 | 290 | 481 | 63 | c32_m128_cpu(32) | Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz | 305 |
| 043_RuSc | 30 | 223 | 370 | 112 | c32_m128_cpu(32) | Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz | 530 |
| 055_ErAlNi | 24 | 217 | 360 | 152 | c32_m128_cpu(32) | Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz | 278 |
| V(this issue) | 32 | 250 | 416 | 112 | c64_m128_cpu_H(64) | AMD EPYC 7452 32-Core Processor | 605 |
| Sm(this issue) | 24 | 159 | 264 | 172 | c64_m128_cpu_H(64) | AMD EPYC 7452 32-Core Processor | 1568 |
I have check one example I calculated previous, and the ecutwfc is also 100 Ry.
example natom nbands nelec kpoints bohrium_machine (parallel core) cpu ave scf_time 041_ZnMnGa 49 290 481 63 c32_m128_cpu(32) Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz 305 043_RuSc 30 223 370 112 c32_m128_cpu(32) Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz 530 055_ErAlNi 24 217 360 152 c32_m128_cpu(32) Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz 278 V(this issue) 32 250 416 112 c64_m128_cpu_H(64) AMD EPYC 7452 32-Core Processor 605 Sm(this issue) 24 159 264 172 c64_m128_cpu_H(64) AMD EPYC 7452 32-Core Processor 1568
Is average scf_time too high for V and Sm? Is any way to solve this?
Is average
scf_timetoo high for V and Sm? Is any way to solve this?
Yes, it seems abnormal for these two examples. I suspect the performance of c64_m128_cpu_H(64) is not good. I will try to use c32_m128_cpu (paratera) to test them.
I use c32_m128_cpu (paratera) to run example v with 32 cores parallel, the first 3 scf steps are:
ITER ETOT(eV) EDIFF(eV) DRHO TIME(s)
DA1 -6.424983e+04 0.000000e+00 2.174e+00 7.761e+02
DA2 -6.425507e+04 -5.239252e+00 2.121e+00 4.531e+02
DA3 -6.425613e+04 -1.065634e+00 8.683e+00 6.127e+02
While the results in this issue are:
ITER ETOT(eV) EDIFF(eV) DRHO TIME(s)
DA1 -6.424983e+04 0.000000e+00 2.174e+00 8.604e+02
DA2 -6.425507e+04 -5.239372e+00 2.121e+00 5.164e+02
DA3 -6.425613e+04 -1.063900e+00 8.689e+00 7.088e+02
As we can see, the performance of c32_m128_cpu (paratera) is better than c64_m128_cpu_H(64).
Update the first 3 SCF steps of Sm on c32_m128_cpu :
ITER ETOT(eV) EDIFF(eV) DRHO TIME(s)
DA1 -2.616618e+04 0.000000e+00 7.877e-02 4.249e+03
DA2 -2.616630e+04 -1.143728e-01 1.732e-02 1.742e+03
DA3 -2.616623e+04 6.750572e-02 1.187e-01 1.925e+03
The results in this issue are:
ITER ETOT(eV) EDIFF(eV) DRHO TIME(s)
DA1 -2.616618e+04 0.000000e+00 7.877e-02 3.811e+03
DA2 -2.616630e+04 -1.143728e-01 1.732e-02 1.372e+03
DA3 -2.616623e+04 6.750572e-02 1.187e-01 1.624e+03
I use c32_m128_cpu (paratera) to run example v with 32 cores parallel, the first 3 scf steps are:
ITER ETOT(eV) EDIFF(eV) DRHO TIME(s) DA1 -6.424983e+04 0.000000e+00 2.174e+00 7.761e+02 DA2 -6.425507e+04 -5.239252e+00 2.121e+00 4.531e+02 DA3 -6.425613e+04 -1.065634e+00 8.683e+00 6.127e+02While the results in this issue are:
ITER ETOT(eV) EDIFF(eV) DRHO TIME(s) DA1 -6.424983e+04 0.000000e+00 2.174e+00 8.604e+02 DA2 -6.425507e+04 -5.239372e+00 2.121e+00 5.164e+02 DA3 -6.425613e+04 -1.063900e+00 8.689e+00 7.088e+02As we can see, the performance of c32_m128_cpu (paratera) is better than c64_m128_cpu_H(64).
Please see the latest V case which fails to finished the scf calculation due to KILLED BY SIGNAL: 6 (Aborted). Maybe this imply something
V_failed_signal_6.zip
Please see the latest V case which fails to finished the scf calculation due to
KILLED BY SIGNAL: 6 (Aborted). Maybe this imply something V_failed_signal_6.zip
The error of this test is related to the SchmitOrth in davidson:
abacus: /abacus-develop/source/module_hsolver/diago_david.cpp:947: void hsolver::DiagoDavid<>::SchmitOrth(const int &, const int, const int, psi::Psi<T, Device> &, const T *, T *, const int, const int) [T = std::complex<double>, Device = psi::DEVICE_CPU]: Assertion `psi_norm > 0.0' failed.
abacus: /abacus-develop/source/module_hsolver/diago_david.cpp:947: void hsolver::DiagoDavid<>::SchmitOrth(const int &, const int, const int, psi::Psi<T, Device> &, const T *, T *, const int, const int) [T = std::complex<double>, Device = psi::DEVICE_CPU]: Assertion `psi_norm > 0.0' failed.
abacus: /abacus-develop/source/module_hsolver/diago_david.cpp:947: void hsolver::DiagoDavid<>::SchmitOrth(const int &, const int, const int, psi::Psi<T, Device> &, const T *, T *, const int, const int) [T = std::complex<double>, Device = psi::DEVICE_CPU]: Assertion `psi_norm > 0.0' failed.
abacus: /abacus-develop/source/module_hsolver/diago_david.cpp:947: void hsolver::DiagoDavid<>::SchmitOrth(const int &, const int, const int, psi::Psi<T, Device> &, const T *, T *, const int, const int) [T = std::complex<double>, Device = psi::DEVICE_CPU]: Assertion `psi_norm > 0.0' failed.
Usually, it is because of the numerical instability.