Request: Better convergence of HSE in magnetic system
Details
I've tested HSE SCF in magnetic system, example is Fe-bcc conventional cell:
ATOMIC_SPECIES
Fe 55.845 Fe_ONCV_PBE-1.0.upf upf201
NUMERICAL_ORBITAL
Fe_gga_8au_100Ry_4s2p2d1f.orb
LATTICE_CONSTANT
1.889726
LATTICE_VECTORS
2.8301511117 0.0000000000 0.0000000000 #latvec1
0.0000000000 2.8301511117 0.0000000000 #latvec2
0.0000000000 -0.0000000000 2.8301511117 #latvec3
ATOMIC_POSITIONS
Direct
Fe #label
2 #magnetism
2 #number of atoms
0.0000000000 0.0000000000 0.0000000000 m 1 1 1
0.5000000000 0.5000000000 0.5000000000 m 1 1 1
And KPT is 9 9 9
Information: ABACUS version: 3.4.4: Commit: 5f9d472 (Mon Dec 4 14:10:21 2023 +0800) Dependence: Intel-OneAPI and Intel-toolchain LibRI and LibComm: latest version before Nov 18
At first, my INPUT example is
#Parameters (1.General)
suffix Fe # suffix of OUTPUT DIR
nspin 2 # 1/2/4 4 for SOC
symmetry 0 # 0/1 1 for open, default
esolver_type ksdft # ksdft, ofdft, sdft, tddft, lj, dp
dft_functional hse # same as upf file, can be lda/pbe/scan/hf/pbe0/hse
ks_solver genelpa # default for ksdft-lcao
vdw_method none # none, d3, d3_bj
#Parameters (2.Iteration)
calculation scf # scf relax cell-relax md
ecutwfc 100
scf_thr 1e-7
scf_nmax 300
#Parameters (3.Basis)
basis_type lcao # lcao or pw
#Parameters (4.Smearing)
smearing_method mp # mp/gaussian/fixed
smearing_sigma 0.002 # Rydberg
#Parameters (5.Mixing)
mixing_type broyden # pulay/broyden
#Parameters (6.Calculation)
cal_force 1
cal_stress 1
out_stru 1 # print STRU in OUT
out_chg 1 # print CHG or not
out_bandgap 1
out_mul 1
it is very hard to converge to scf_the 1e-7, even cannot reach scf_thr 1e-6 within 5-days calculation in OMP_NUM_THREADS=16 mpirun -np 4 abacus in Intel-8358
# After more than 700 lines of print-out and 4-days calculation
Updating EXX and rerun SCF
GE1 5.32e+00 5.81e+00 -6.437418e+03 0.000000e+00 1.291e-06 9.196e+00
GE2 5.32e+00 5.81e+00 -6.437418e+03 1.364268e-09 7.188e-07 8.863e+00
GE3 5.32e+00 5.81e+00 -6.437418e+03 1.468676e-09 3.518e-07 8.800e+00
GE4 5.32e+00 5.81e+00 -6.437418e+03 5.839128e-10 2.326e-07 8.802e+00
GE5 5.32e+00 5.81e+00 -6.437418e+03 -2.100539e-09 3.236e-08 8.843e+00
Updating EXX and rerun SCF
GE1 5.32e+00 5.81e+00 -6.437418e+03 0.000000e+00 1.058e-06 9.111e+00
GE2 5.32e+00 5.81e+00 -6.437418e+03 -1.546015e-09 5.929e-07 8.805e+00
GE3 5.32e+00 5.81e+00 -6.437418e+03 -1.840679e-10 2.948e-07 8.871e+00
GE4 5.32e+00 5.81e+00 -6.437418e+03 1.423819e-09 4.995e-08 8.820e+00
And after I saw #3103 , I add a parameter in my INPUT:
mixing_gg0 0.0
After that, convergence performance is better, in 2-days calculation of OMP_NUM_THREADS=24 mpirun -np 2 abacus in Intel-8162, the SCF converge to scf_thr 1e-6, but not scf_thr 1e-7
START CHARGE : atomic
DONE(177.792 SEC) : INIT SCF
ITER TMAG AMAG ETOT(eV) EDIFF(eV) DRHO TIME(s)
GE1 4.01e+00 4.01e+00 -6.440073e+03 0.000000e+00 4.826e-02 4.429e+00
GE2 4.31e+00 4.41e+00 -6.440405e+03 -3.311553e-01 1.996e-02 3.688e+00
GE3 4.33e+00 4.43e+00 -6.440409e+03 -4.691903e-03 5.726e-03 3.677e+00
GE4 4.33e+00 4.43e+00 -6.440409e+03 2.332581e-04 3.079e-03 3.684e+00
GE5 4.33e+00 4.43e+00 -6.440409e+03 -5.472160e-05 1.219e-03 3.626e+00
GE6 4.33e+00 4.43e+00 -6.440409e+03 -1.579811e-05 1.703e-04 3.681e+00
GE7 4.33e+00 4.43e+00 -6.440409e+03 -2.383246e-07 6.439e-05 3.724e+00
GE8 4.33e+00 4.43e+00 -6.440409e+03 -6.277874e-08 2.805e-05 3.635e+00
GE9 4.33e+00 4.43e+00 -6.440409e+03 -2.755682e-08 9.261e-06 3.668e+00
GE10 4.33e+00 4.43e+00 -6.440409e+03 1.987624e-10 9.984e-07 3.717e+00
GE11 4.33e+00 4.43e+00 -6.440409e+03 1.256766e-09 1.477e-07 3.667e+00
GE12 4.33e+00 4.43e+00 -6.440409e+03 -2.078884e-09 8.750e-08 3.641e+00
Updating EXX and rerun SCF
GE1 5.07e+00 5.25e+00 -6.432274e+03 0.000000e+00 6.975e-02 1.732e+01
GE2 5.12e+00 5.38e+00 -6.437178e+03 -4.903432e+00 5.335e-02 1.714e+01
GE3 5.08e+00 5.37e+00 -6.437337e+03 -1.595823e-01 2.761e-02 1.717e+01
GE4 5.08e+00 5.36e+00 -6.436762e+03 5.755460e-01 2.955e-02 1.724e+01
GE5 5.18e+00 5.45e+00 -6.437070e+03 -3.075961e-01 1.282e-02 1.730e+01
GE6 5.20e+00 5.46e+00 -6.437078e+03 -8.548606e-03 8.137e-03 1.715e+01
GE7 5.19e+00 5.45e+00 -6.437053e+03 2.523551e-02 9.021e-03 1.717e+01
GE8 5.22e+00 5.47e+00 -6.437049e+03 4.194422e-03 4.162e-03 1.725e+01
GE9 5.25e+00 5.49e+00 -6.437052e+03 -2.974158e-03 3.035e-04 1.720e+01
GE10 5.25e+00 5.49e+00 -6.437052e+03 1.164049e-05 3.154e-04 1.713e+01
GE11 5.25e+00 5.49e+00 -6.437052e+03 -1.927004e-05 9.251e-05 1.714e+01
GE12 5.25e+00 5.49e+00 -6.437052e+03 4.742927e-06 1.342e-04 1.723e+01
GE13 5.25e+00 5.49e+00 -6.437052e+03 -3.654831e-06 1.064e-04 1.724e+01
GE14 5.25e+00 5.49e+00 -6.437052e+03 -1.292602e-06 2.761e-06 1.720e+01
GE15 5.25e+00 5.49e+00 -6.437052e+03 -4.918788e-10 1.088e-06 1.726e+01
GE16 5.25e+00 5.49e+00 -6.437052e+03 -1.480277e-09 5.592e-07 1.722e+01
GE17 5.25e+00 5.49e+00 -6.437052e+03 3.408349e-09 1.277e-07 1.720e+01
GE18 5.25e+00 5.49e+00 -6.437052e+03 3.209587e-10 1.536e-08 1.722e+01
Updating EXX and rerun SCF
GE1 5.30e+00 5.66e+00 -6.437386e+03 0.000000e+00 7.783e-03 1.756e+01
GE2 5.30e+00 5.70e+00 -6.437389e+03 -2.905916e-03 3.097e-03 1.776e+01
GE3 5.30e+00 5.69e+00 -6.437389e+03 -1.426553e-04 3.709e-04 1.768e+01
GE4 5.30e+00 5.69e+00 -6.437389e+03 -3.933669e-07 1.830e-04 1.763e+01
GE5 5.30e+00 5.69e+00 -6.437389e+03 -1.916378e-07 6.337e-05 1.758e+01
GE6 5.30e+00 5.69e+00 -6.437389e+03 -5.438509e-08 6.068e-06 1.773e+01
GE7 5.30e+00 5.69e+00 -6.437389e+03 6.844540e-10 4.172e-06 1.765e+01
GE8 5.30e+00 5.69e+00 -6.437389e+03 -2.401390e-09 2.932e-06 1.760e+01
GE9 5.30e+00 5.69e+00 -6.437389e+03 -1.980663e-09 3.465e-07 1.768e+01
GE10 5.30e+00 5.69e+00 -6.437389e+03 1.095900e-09 4.516e-08 1.761e+01
Updating EXX and rerun SCF
GE1 5.30e+00 5.75e+00 -6.437412e+03 0.000000e+00 2.970e-03 1.772e+01
GE2 5.30e+00 5.77e+00 -6.437412e+03 -5.071874e-04 1.115e-03 1.761e+01
GE3 5.30e+00 5.76e+00 -6.437412e+03 -3.600643e-05 3.660e-04 1.766e+01
GE4 5.30e+00 5.76e+00 -6.437412e+03 1.002332e-06 1.333e-04 1.767e+01
GE5 5.30e+00 5.76e+00 -6.437412e+03 -3.536508e-07 3.344e-05 1.765e+01
GE6 5.30e+00 5.76e+00 -6.437412e+03 -1.065892e-08 3.677e-06 1.761e+01
GE7 5.30e+00 5.76e+00 -6.437412e+03 1.508119e-10 2.340e-06 1.777e+01
GE8 5.30e+00 5.76e+00 -6.437412e+03 -2.848412e-09 1.372e-06 1.762e+01
GE9 5.30e+00 5.76e+00 -6.437412e+03 -6.697595e-10 5.157e-07 1.766e+01
GE10 5.30e+00 5.76e+00 -6.437412e+03 2.343385e-10 2.126e-07 1.772e+01
GE11 5.30e+00 5.76e+00 -6.437412e+03 2.143849e-09 3.190e-08 1.777e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 8.249e-04 1.792e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -3.188180e-05 3.782e-04 1.772e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 -7.317703e-08 1.303e-04 1.774e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 -3.181451e-07 5.785e-05 1.770e+01
GE5 5.29e+00 5.78e+00 -6.437418e+03 1.346944e-08 1.282e-05 1.783e+01
GE6 5.29e+00 5.78e+00 -6.437418e+03 -2.061869e-09 2.488e-06 1.767e+01
GE7 5.29e+00 5.78e+00 -6.437418e+03 2.597832e-09 4.422e-07 1.771e+01
GE8 5.29e+00 5.78e+00 -6.437418e+03 3.727761e-10 1.378e-07 1.783e+01
GE9 5.29e+00 5.78e+00 -6.437418e+03 -5.916467e-10 5.191e-08 1.774e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 2.048e-04 1.776e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -1.439515e-06 9.945e-05 1.772e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 2.207036e-08 4.256e-05 1.779e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 1.388243e-09 1.148e-05 1.771e+01
GE5 5.29e+00 5.78e+00 -6.437418e+03 -1.028305e-08 4.965e-06 1.766e+01
GE6 5.29e+00 5.78e+00 -6.437418e+03 -4.977566e-09 4.509e-07 1.777e+01
GE7 5.29e+00 5.78e+00 -6.437418e+03 8.770292e-10 1.506e-07 1.783e+01
GE8 5.29e+00 5.78e+00 -6.437418e+03 -9.288467e-10 6.491e-08 1.770e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 6.077e-05 1.769e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -1.134252e-07 3.019e-05 1.777e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 8.373541e-09 1.607e-05 1.777e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 2.111367e-09 2.498e-06 1.768e+01
GE5 5.29e+00 5.78e+00 -6.437418e+03 -3.221961e-09 4.133e-07 1.770e+01
GE6 5.29e+00 5.78e+00 -6.437418e+03 4.555293e-10 1.491e-07 1.771e+01
GE7 5.29e+00 5.78e+00 -6.437418e+03 2.135342e-09 4.883e-08 1.783e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 2.286e-05 1.784e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -2.029078e-08 1.168e-05 1.772e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 1.047176e-09 6.660e-06 1.773e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 2.583137e-10 1.001e-06 1.789e+01
GE5 5.29e+00 5.78e+00 -6.437418e+03 -1.795822e-09 4.420e-07 1.777e+01
GE6 5.29e+00 5.78e+00 -6.437418e+03 1.625675e-09 7.378e-08 1.779e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 1.176e-05 1.768e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -6.389011e-09 5.673e-06 1.767e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 1.296982e-09 3.038e-06 1.780e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 3.175557e-09 3.255e-06 1.771e+01
GE5 5.29e+00 5.78e+00 -6.437418e+03 -3.668210e-09 2.879e-07 1.769e+01
GE6 5.29e+00 5.78e+00 -6.437418e+03 7.262173e-10 4.905e-08 1.772e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 6.956e-06 1.776e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -2.089712e-09 3.181e-06 1.795e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 -1.856147e-11 1.484e-06 1.772e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 4.439284e-10 7.081e-07 1.771e+01
GE5 5.29e+00 5.78e+00 -6.437418e+03 -5.777256e-10 2.457e-07 1.781e+01
GE6 5.29e+00 5.78e+00 -6.437418e+03 6.627990e-10 3.775e-08 1.774e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 3.987e-06 1.776e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 5.614843e-10 1.749e-06 1.779e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 -1.237431e-11 7.742e-07 1.771e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 -1.139210e-09 9.115e-07 1.785e+01
GE5 5.29e+00 5.78e+00 -6.437418e+03 2.412990e-10 6.300e-08 1.778e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 2.462e-06 1.779e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 6.380504e-10 1.072e-06 1.781e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 1.345706e-09 5.480e-07 1.785e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 2.142302e-10 4.647e-07 1.776e+01
GE5 5.29e+00 5.78e+00 -6.437418e+03 -2.590871e-10 4.279e-08 1.778e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 1.403e-06 1.777e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 1.235111e-09 6.003e-07 1.775e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 8.615614e-10 2.236e-07 1.787e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 -1.662025e-09 1.244e-07 1.777e+01
GE5 5.29e+00 5.78e+00 -6.437418e+03 8.360393e-10 4.124e-08 1.779e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 9.645e-07 1.775e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -1.137663e-09 6.704e-07 1.771e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 9.613292e-10 4.905e-07 1.784e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 6.372770e-10 1.410e-07 1.780e+01
GE5 5.29e+00 5.78e+00 -6.437418e+03 -5.181742e-10 2.714e-08 1.778e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 5.150e-07 1.781e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -2.575403e-10 3.110e-07 1.782e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 1.438514e-09 2.384e-07 1.790e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 -8.399063e-10 7.898e-08 1.780e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 3.857e-07 1.780e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -1.633409e-09 5.688e-07 1.778e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 -1.924205e-09 1.518e-07 1.777e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 3.443925e-09 5.881e-08 1.782e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 3.686e-07 1.778e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -1.023974e-09 1.722e-07 1.784e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 2.084298e-09 7.166e-08 1.777e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 2.508e-07 1.841e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -7.285375e-10 5.146e-07 1.832e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 1.717709e-09 9.339e-08 1.835e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 2.401e-07 1.783e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 2.266046e-10 2.545e-07 1.782e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 -5.599375e-10 1.674e-07 1.791e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 -1.832945e-10 5.861e-08 1.780e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 2.153e-07 1.786e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 -2.714614e-10 2.968e-07 1.779e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 -6.473311e-10 1.489e-07 1.779e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 2.733949e-09 4.245e-08 1.788e+01
Updating EXX and rerun SCF
GE1 5.29e+00 5.78e+00 -6.437418e+03 0.000000e+00 2.514e-07 1.780e+01
GE2 5.29e+00 5.78e+00 -6.437418e+03 1.046403e-09 2.553e-07 1.787e+01
GE3 5.29e+00 5.78e+00 -6.437418e+03 1.063417e-09 1.525e-07 1.774e+01
GE4 5.29e+00 5.78e+00 -6.437418e+03 -5.251348e-10 4.837e-08 1.787e+01
And memory consumption is 50G during calculation. Is this performance normal and proper for this system ? Can some improvements be done ?
Also. there exists some problem from user for using HSE :
- There is not any print-out in stdout and running*.log in EXX process (despite
Updateing EXX and rerun SCFnotice), which will give user a bad view that the calculation is stuck. Can more print-out information like consumed time in EXX process and some key process - How can I restart HSE SCF calculation properly if a complete SCF is not done? Because total SCF is not done, charge file will not be written, I'm trying using wavefunction file and restart file. However, due to HSE process will calculate PBE SCF first no matter
exx_separate_loopis 0 or 1, if I directly use wfc or restart file from half-calculated HSE process, will the initialization useless because of the first PBE process ? - How can I set MPI and OMP number for best calculation performance (if memory is permitted and number of physical core is fixed)? set more OMP number will reduce memory cost, but from my observation on CPU status of HPC server during EXX process, it seems EXX process are sometimes mainly parallelized by MPI
Task list for Issue attackers (only for developers)
- [ ] Reproduce the performance issue on a similar system or environment.
- [ ] Identify the specific section of the code causing the performance issue.
- [ ] Investigate the issue and determine the root cause.
- [ ] Research best practices and potential solutions for the identified performance issue.
- [ ] Implement the chosen solution to address the performance issue.
- [ ] Test the implemented solution to ensure it improves performance without introducing new issues.
- [ ] Optimize the solution if necessary, considering trade-offs between performance and other factors (e.g., code complexity, readability, maintainability).
- [ ] Review and incorporate any relevant feedback from users or developers.
- [ ] Merge the improved solution into the main codebase and notify the issue reporter.
- How can I restart HSE SCF calculation properly if a complete SCF is not done? Because total SCF is not done, charge file will not be written, I'm trying using wavefunction file and restart file. However, due to HSE process will calculate PBE SCF first no matter
exx_separate_loopis 0 or 1, if I directly use wfc or restart file from half-calculated HSE process, will the initialization useless because of the first PBE process ?
From my practice now, HSE calculation CANNOT be restarted from wfc file or restart file, they can only restart PBE part.
@dyzheng @PeizeLin May this problem need together view and work ?
I complete a HSE SCF calculation by using 'scf_the 1e-6' and mixing_gg0 0 in Fe-bcc system above. The time cost is below:
TIME STATISTICS
--------------------------------------------------------------------------------------
CLASS_NAME NAME TIME(Sec) CALLS AVG(Sec) PER(%)
--------------------------------------------------------------------------------------
total 137589.57 9 15287.73 100.00
Driver reading 0.02 1 0.02 0.00
Input Init 0.01 1 0.01 0.00
Input_Conv Convert 0.00 1 0.00 0.00
Driver driver_line 137589.55 1 137589.55 100.00
UnitCell check_tau 0.00 1 0.00 0.00
PW_Basis_Sup setuptransform 0.00 1 0.00 0.00
PW_Basis_Sup distributeg 0.00 1 0.00 0.00
mymath heapsort 0.00 4 0.00 0.00
PW_Basis_K setuptransform 0.02 1 0.02 0.00
PW_Basis_K distributeg 0.00 1 0.00 0.00
PW_Basis setup_struc_factor 0.02 1 0.02 0.00
ORB_control read_orb_first 0.08 1 0.08 0.00
LCAO_Orbitals Read_Orbitals 0.08 1 0.08 0.00
NOrbital_Lm extra_uniform 43.58 16798 0.00 0.03
Mathzone_Add1 SplineD2 0.20 16798 0.00 0.00
Mathzone_Add1 Cubic_Spline_Interpolation 0.80 16798 0.00 0.00
Mathzone_Add1 Uni_Deriv_Phi 40.54 16798 0.00 0.03
Sphbes Spherical_Bessel 0.02 6030 0.00 0.00
Exx_LRI init 118.34 1 118.34 0.09
Matrix_Orbs21 init 11.41 2 5.70 0.01
ORB_gaunt_table init_Gaunt_CH 0.94 3 0.31 0.00
ORB_gaunt_table Calc_Gaunt_CH 0.47 408058 0.00 0.00
ORB_gaunt_table init_Gaunt 9.68 3 3.23 0.01
ORB_gaunt_table Get_Gaunt_SH 16.71 28208149 0.00 0.01
Matrix_Orbs21 init_radial 0.00 2 0.00 0.00
Matrix_Orbs21 init_radial_table 93.67 2 46.83 0.07
ORB_table_phi cal_ST_Phi12_R 49.66 31655 0.00 0.04
LRI_CV set_orbitals 61.76 1 61.76 0.04
Matrix_Orbs11 init 0.28 1 0.28 0.00
Matrix_Orbs11 init_radial 0.00 1 0.00 0.00
Matrix_Orbs11 init_radial_table 11.38 1 11.38 0.01
ppcell_vl init_vloc 0.00 1 0.00 0.00
Ions opt_ions 137470.74 1 137470.74 99.91
ESolver_KS_LCAO Run 82117.20 1 82117.20 59.68
ESolver_KS_LCAO beforescf 91.86 1 91.86 0.07
ESolver_KS_LCAO beforesolver 17.00 1 17.00 0.01
ESolver_KS_LCAO set_matrix_grid 16.98 1 16.98 0.01
atom_arrange search 0.00 1 0.00 0.00
Grid_Technique init 16.97 1 16.97 0.01
Grid_BigCell grid_expansion_index 0.01 2 0.00 0.00
Record_adj for_2d 0.01 1 0.01 0.00
Grid_Driver Find_atom 0.01 402 0.00 0.00
LCAO_Hamilt grid_prepare 0.00 1 0.00 0.00
Veff initialize_HR 0.00 1 0.00 0.00
OverlapNew initialize_SR 0.00 1 0.00 0.00
EkineticNew initialize_HR 0.00 1 0.00 0.00
NonlocalNew initialize_HR 0.00 1 0.00 0.00
Charge set_rho_core 0.00 1 0.00 0.00
Charge atomic_rho 0.04 1 0.04 0.00
PW_Basis_Sup recip2real 0.66 882 0.00 0.00
PW_Basis_Sup gathers_scatterp 0.42 882 0.00 0.00
Potential init_pot 0.05 1 0.05 0.00
Potential update_from_charge 17.10 97 0.18 0.01
Potential cal_fixed_v 0.00 1 0.00 0.00
PotLocal cal_fixed_v 0.00 1 0.00 0.00
Potential cal_v_eff 17.09 97 0.18 0.01
H_Hartree_pw v_hartree 0.23 97 0.00 0.00
PW_Basis_Sup real2recip 1.22 891 0.00 0.00
PW_Basis_Sup gatherp_scatters 1.00 891 0.00 0.00
PotXC cal_v_eff 16.85 97 0.17 0.01
XC_Functional v_xc 58072.22 54 1075.41 42.21
Potential interpolate_vrs 0.00 97 0.00 0.00
Exx_LRI cal_exx_ions 74.76 1 74.76 0.05
LRI_CV cal_datas 43.30 3 14.43 0.03
H_Ewald_pw compute_ewald 0.01 1 0.01 0.00
HSolverLCAO solve 2210.18 96 23.02 1.61
HamiltLCAO updateHk 1006.91 70080 0.01 0.73
OperatorLCAO init 22.59 273020 0.00 0.02
Veff contributeHR 16.24 192 0.08 0.01
Gint_interface cal_gint 19.10 290 0.07 0.01
Gint_interface cal_gint_vlocal 11.71 192 0.06 0.01
Gint_Tools cal_psir_ylm 0.68 7776 0.00 0.00
Gint_k transfer_pvpR 4.53 192 0.02 0.00
OverlapNew calculate_SR 0.09 1 0.09 0.00
OverlapNew contributeHk 24.81 70080 0.00 0.02
EkineticNew contributeHR 0.16 192 0.00 0.00
EkineticNew calculate_HR 0.16 1 0.16 0.00
NonlocalNew contributeHR 0.17 192 0.00 0.00
NonlocalNew calculate_HR 0.14 1 0.14 0.00
OperatorLCAO contributeHk 44.18 70080 0.00 0.03
HSolverLCAO hamiltSolvePsiK 1092.18 70080 0.02 0.79
DiagoElpa elpa_solve 941.99 70080 0.01 0.68
ElecStateLCAO psiToRho 110.81 96 1.15 0.08
elecstate cal_dm 73.74 97 0.76 0.05
psiMulPsiMpi pdgemm 72.28 70810 0.00 0.05
DensityMatrix cal_DMR 9.99 97 0.10 0.01
Local_Orbital_wfc wfc_2d_to_grid 46.65 81030 0.00 0.03
Gint transfer_DMR 1.07 96 0.01 0.00
Gint_interface cal_gint_rho 7.13 96 0.07 0.01
Charge_Mixing get_drho 0.12 96 0.00 0.00
Charge mix_rho 0.15 81 0.00 0.00
Charge Broyden_mixing 0.10 81 0.00 0.00
ModuleIO write_wfc_nao_complex 28.07 10950 0.00 0.02
Exx_LRI cal_exx_elec 79763.06 14 5697.36 57.97
RI_2D_Comm split_m2D_ktoR 54.60 14 3.90 0.04
RI_2D_Comm add_Hexx 918.97 62780 0.01 0.67
XC_Functional v_xc_libxc 16.72 86 0.19 0.01
Exx_LRI write_Hexxs 0.19 1 0.19 0.00
ESolver_KS_LCAO out_deepks_labels 0.00 1 0.00 0.00
LCAO_Deepks_Interface out_deepks_labels 0.00 1 0.00 0.00
HamiltLCAO updateSk 0.27 730 0.00 0.00
Force_Stress_LCAO getForceStress 55353.54 1 55353.54 40.23
Forces cal_force_loc 0.00 1 0.00 0.00
Forces cal_force_ew 0.00 1 0.00 0.00
Forces cal_force_cc 0.00 1 0.00 0.00
Forces cal_force_scc 0.01 1 0.01 0.00
Stress_Func stress_loc 0.00 1 0.00 0.00
Stress_Func stress_har 0.00 1 0.00 0.00
Stress_Func stress_ewa 0.00 1 0.00 0.00
Stress_Func stress_cc 0.00 1 0.00 0.00
Stress_Func stress_gga 0.03 1 0.03 0.00
Force_LCAO_k ftable_k 3.31 1 3.31 0.00
Force_LCAO_k allocate_k 0.79 1 0.79 0.00
LCAO_gen_fixedH b_NL_mu_new 0.40 1 0.40 0.00
Force_LCAO_k cal_foverlap_k 1.12 1 1.12 0.00
Force_LCAO_k cal_edm_2d 1.12 1 1.12 0.00
DensityMatrix sum_DMR_spin 0.00 1 0.00 0.00
Force_LCAO_k cal_ftvnl_dphi_k 0.00 1 0.00 0.00
Force_LCAO_k cal_fvl_dphi_k 0.26 1 0.26 0.00
Gint_interface cal_gint_force 0.26 2 0.13 0.00
Gint_Tools cal_dpsir_ylm 0.08 54 0.00 0.00
Gint_Tools cal_dpsirr_ylm 0.01 54 0.00 0.00
Force_LCAO_k cal_fvnl_dbeta_k_new 0.94 1 0.94 0.00
Exx_LRI cal_exx_force 38787.02 1 38787.02 28.19
Exx_LRI cal_exx_stress 16563.17 1 16563.17 12.04
ModuleIO write_istate_info 0.05 1 0.05 0.00
--------------------------------------------------------------------------------------
----------------------------------------------------------
START Time : Sat Dec 16 07:12:39 2023
FINISH Time : Sun Dec 17 21:25:49 2023
TOTAL Time : 137590
Much time costs in EXX, and also force and stress calculation.
I consider that there should be much space left for performance update.
Here are some tests of convergence steps.
| gg0 | loop0,broyden | loop0,pulay | loop1,broyden | loop1,pulay |
|---|---|---|---|---|
| 0.0 | 36 | 35 | 22 | 22 |
| 0.2 | 30 | 36 | 19 | 21 |
| 0.4 | 31 | 46 | 19 | 19 |
| 0.6 | 27 | 28 | 21 | 23 |
| 0.8 | 29 | 34 | 19 | 19 |
| 1.0 | 27 | 27 | 20 | 20 |
It seems that in this system, gg0 does not affect the convergence speed.
Here are some tests of convergence steps.
gg0 loop0,broyden loop0,pulay loop1,broyden loop1,pulay 0.0 36 35 22 22 0.2 30 36 19 21 0.4 31 46 19 19 0.6 27 28 21 23 0.8 29 34 19 19 1.0 27 27 20 20 It seems that in this system, gg0 does not affect the convergence speed.
But, there are two other things which are more important
- The numbers of loop for final convergence, if set
mixing_gg0 1.0as default, the HSE calculation CANNOT converge to DRHO=1e-6 - The time cost in EXX step may be affected by this setting.
@QuantumMisaka, Do you want more discussion, or we can close this issue?
@QuantumMisaka, Do you want more discussion, or we can close this issue?
There is some update, I'll have more discussion later
I am testing HSE computation performance on these FeCx systems below
- Fe2C
which have 16 Fe atoms and 8 C atoms in a 6.4 * 6.4 * 5.6 tetragonal cell.
ATOMIC_SPECIES
C 12.011 C_ONCV_PBE-1.0.upf upf201
Fe 55.845 Fe_ONCV_PBE-1.0.upf upf201
NUMERICAL_ORBITAL
C_gga_7au_100Ry_2s2p1d.orb
Fe_gga_8au_100Ry_4s2p2d1f.orb
LATTICE_CONSTANT
1.889726
LATTICE_VECTORS
6.3769109328 0.0343260832 -0.0001405731 #latvec1
0.5967808069 6.3487148203 0.0001877534 #latvec2
0.0004958326 -0.0005937998 5.6475745284 #latvec3
ATOMIC_POSITIONS
Direct
C #label
-1 #magnetism
8 #number of atoms
0.3333347121 0.0000000460 0.0416681325 m 1 1 1
0.8333318617 0.5000000561 0.0416692141 m 1 1 1
0.3333319326 0.4999999864 0.2916627638 m 1 1 1
0.8333347316 0.9999998433 0.2916668251 m 1 1 1
0.3333347550 0.0000000537 0.5416683381 m 1 1 1
0.8333320725 0.5000000845 0.5416689581 m 1 1 1
0.3333319742 0.5000000254 0.7916628430 m 1 1 1
0.8333347420 0.9999999572 0.7916668352 m 1 1 1
Fe #label
2 #magnetism
16 #number of atoms
0.5356480922 0.4526941341 0.0416327237 m 1 1 1
0.1310164298 0.5473060909 0.0416969262 m 1 1 1
0.0356503367 0.9526938603 0.0416364215 m 1 1 1
0.6310186916 0.0473059161 0.0417006360 m 1 1 1
0.8806444504 0.2976880874 0.2916325414 m 1 1 1
0.3806449276 0.7976871739 0.2916315420 m 1 1 1
0.2860222877 0.2023122773 0.2917006324 m 1 1 1
0.7860215681 0.7023128061 0.2917016191 m 1 1 1
0.5356481536 0.4526940290 0.5416326165 m 1 1 1
0.1310164033 0.5473060037 0.5416968098 m 1 1 1
0.0356502134 0.9526938868 0.5416363924 m 1 1 1
0.6310185668 0.0473059275 0.5417005647 m 1 1 1
0.8806444140 0.2976878707 0.7916326007 m 1 1 1
0.3806448806 0.7976870789 0.7916316216 m 1 1 1
0.2860223248 0.2023119874 0.7917007357 m 1 1 1
0.7860214815 0.7023128197 0.7917017073 m 1 1 1
- Fe3C
which have 12 Fe atoms and 4 C atoms in a 5.0 * 4.5 * 6.7 orthogonal cell.
ATOMIC_SPECIES
C 12.011 C_ONCV_PBE-1.0.upf upf201
Fe 55.845 Fe_ONCV_PBE-1.0.upf upf201
NUMERICAL_ORBITAL
C_gga_7au_100Ry_2s2p1d.orb
Fe_gga_8au_100Ry_4s2p2d1f.orb
LATTICE_CONSTANT
1.8897
LATTICE_VECTORS
5.0336918943 -0.0000153613 0.0001148504 #latvec1
0.0000242702 4.5205688988 -0.0004021423 #latvec2
0.0000295086 -0.0042714172 6.7265819577 #latvec3
ATOMIC_POSITIONS
Direct
C #label
-1 #magnetism
4 #number of atoms
0.9999002510 0.7477581421 0.2496296076 m 1 1 1
0.4999042747 0.1269749211 0.2505363048 m 1 1 1
0.2501001489 0.6268918528 0.7497891752 m 1 1 1
0.7501027998 0.2478196639 0.7499201588 m 1 1 1
Fe #label
2 #magnetism
12 #number of atoms
0.3002934154 0.8583856824 0.0682195983 m 1 1 1
0.7999770331 0.0167677619 0.0682404297 m 1 1 1
0.1611385679 0.3533551730 0.2499013338 m 1 1 1
0.6611406049 0.5213978215 0.2499994318 m 1 1 1
0.8003304262 0.0164804414 0.4318831297 m 1 1 1
0.2999663836 0.8579983835 0.4318564945 m 1 1 1
0.4499017658 0.3579948191 0.5679338394 m 1 1 1
0.9498731410 0.5164006883 0.5681836656 m 1 1 1
0.5888838255 0.8532989502 0.7499398028 m 1 1 1
0.0888926917 0.0214473387 0.7500547648 m 1 1 1
0.4498977733 0.3584613447 0.9315988971 m 1 1 1
0.9499203204 0.5167191345 0.9318523371 m 1 1 1
Calculation setting:
KPT: use 25A^-1 setting, for Fe2C and for Fe3C is 9 9 9
INPUT is setting as below by the advices from this issue
INPUT_PARAMETERS RUNNING ABACUS-DFT
#Parameters (1.General)
suffix Fe2C-HSE # suffix of OUTPUT DIR
#ntype 4 # number of element
nspin 2 # 1/2/4 4 for SOC
symmetry 0 # 0/1 1 for open, default
esolver_type ksdft # ksdft, ofdft, sdft, tddft, lj, dp
dft_functional hse # same as upf file, can be lda/pbe/scan/hf/pbe0/hse
ks_solver genelpa # default for ksdft-lcao
vdw_method none # none, d3, d3_bj
pseudo_dir /lustre/home/2201110432/example/abacus/PP
orbital_dir /lustre/home/2201110432/example/abacus/ORB
# SCF if HSE
exx_separate_loop 1 # default, optimized HSE method using LibRI
exx_cauchy_threshold 0 #default 1e-7, 0 to turn off
exx_cauchy_force_threshold 0
exx_cauchy_stress_threshold 0
exx_ccp_rmesh_times 1 # default 1.5
exx_dm_threshold 1e-3 # default 1e-4
mixing_gg0 0 # for HSE this is needed
#Parameters (2.Iteration)
calculation scf # scf relax cell-relax md
ecutwfc 100
scf_thr 1e-6
scf_nmax 300
#Parameters (3.Basis)
basis_type lcao # lcao or pw
#Parameters (4.Smearing)
smearing_method mp # mp/gaussian/fixed
smearing_sigma 0.002 # Rydberg
#Parameters (5.Mixing)
mixing_type broyden # pulay/broyden
mixing_ndim 8 # mixing dimension, for low-d can set to 20
#Parameters (6.Calculation)
cal_force 1
cal_stress 1
out_stru 1 # print STRU in OUT
out_chg 1 # print CHG or not
out_bandgap 1
out_mul 1 # print Mulliken charge and mag of atom in mulliken.txt
out_wfc_lcao 1 ## I forgot to close it sometimes
And:
- All calculation performed using Intel-8358 Server by using 4node, 64core.
- Parallelism scheme: MPI=8, OMP=32
The test result till now is :
- if use the original LibRI in GitHub, the EXX step will costs lots of time (1E-4 magnitude) and lead to much calculation time cost for HSE
GE22 2.26e+01 2.44e+01 -3.926133e+04 -3.829849e-09 1.132e-06 3.898e+01
GE23 2.26e+01 2.44e+01 -3.926133e+04 1.033255e-09 3.032e-07 3.861e+01
Updating EXX and rerun SCF 1.370e+04 (s)
GE1 2.54e+01 2.87e+01 -3.920527e+04 0.000000e+00 6.442e-02 1.991e+02
GE2 2.77e+01 3.04e+01 -3.923653e+04 -3.125964e+01 6.639e-02 1.976e+02
- if use the loop3 version in gitee, which is recommended by @PeizeLin , the EXX time cost is much better (8E-2 - 1E-3 magnitude) under the same parameter
GE26 2.26e+01 2.44e+01 -3.926134e+04 -4.201078e-09 1.981e-06 3.689e+01
GE27 2.26e+01 2.44e+01 -3.926134e+04 1.806031e-08 8.870e-07 3.725e+01
Updating EXX and rerun SCF 8.483e+02 (s)
GE1 2.53e+01 2.87e+01 -3.920525e+04 0.000000e+00 6.444e-02 1.967e+02
GE2 2.77e+01 3.04e+01 -3.923653e+04 -3.128343e+01 6.641e-02 1.946e+02
- Fe2C system can be converged in 120800 s (33.6 h)
But Fe3C systems, even have less number of atoms, is hard to converge
Updating EXX and rerun SCF 1.022e+03 (s)
GE1 2.97e+01 3.33e+01 -3.923971e+04 0.000000e+00 2.960e-05 2.873e+02
GE2 2.97e+01 3.33e+01 -3.923971e+04 5.290983e-06 2.980e-05 2.835e+02
GE3 2.97e+01 3.33e+01 -3.923971e+04 5.187453e-06 4.958e-05 2.785e+02
GE4 2.97e+01 3.33e+01 -3.923971e+04 -7.941523e-06 2.576e-05 2.875e+02
GE5 2.97e+01 3.33e+01 -3.923971e+04 1.500385e-07 2.448e-05 2.881e+02
GE6 2.97e+01 3.33e+01 -3.923971e+04 -1.990061e-06 7.003e-06 2.765e+02
GE7 2.97e+01 3.33e+01 -3.923971e+04 -1.512264e-07 1.223e-06 2.865e+02
GE8 2.97e+01 3.33e+01 -3.923971e+04 3.427684e-09 1.540e-06 2.925e+02
GE9 2.97e+01 3.33e+01 -3.923971e+04 -5.395199e-09 3.099e-07 3.310e+02
Updating EXX and rerun SCF 1.014e+03 (s)
GE1 2.97e+01 3.33e+01 -3.923971e+04 0.000000e+00 3.050e-05 2.861e+02
GE2 2.97e+01 3.33e+01 -3.923971e+04 9.559464e-07 2.721e-05 2.842e+02
GE3 2.97e+01 3.33e+01 -3.923971e+04 7.419216e-06 3.428e-05 2.825e+02
GE4 2.97e+01 3.33e+01 -3.923971e+04 -2.696152e-06 3.636e-05 2.839e+02
GE5 2.97e+01 3.33e+01 -3.923971e+04 -4.620017e-06 1.934e-05 2.835e+02
GE6 2.97e+01 3.33e+01 -3.923971e+04 -9.654870e-07 2.373e-06 2.853e+02
GE7 2.97e+01 3.33e+01 -3.923971e+04 -2.641915e-09 2.520e-06 2.793e+02
GE8 2.97e+01 3.33e+01 -3.923971e+04 -1.140911e-08 1.989e-06 2.794e+02
GE9 2.97e+01 3.33e+01 -3.923971e+04 -1.233719e-08 3.303e-07 3.120e+02
Updating EXX and rerun SCF 1.011e+03 (s)
GE1 2.97e+01 3.33e+01 -3.923971e+04 0.000000e+00 3.178e-05 2.780e+02
GE2 2.97e+01 3.33e+01 -3.923971e+04 2.795153e-06 2.902e-05 2.777e+02
GE3 2.97e+01 3.33e+01 -3.923971e+04 6.991417e-06 4.116e-05 2.757e+02
GE4 2.97e+01 3.33e+01 -3.923971e+04 -5.639035e-06 3.640e-05 2.759e+02
GE5 2.97e+01 3.33e+01 -3.923971e+04 -1.501697e-06 2.542e-05 2.735e+02
GE6 2.97e+01 3.33e+01 -3.923971e+04 -1.930423e-06 7.286e-06 2.976e+02
GE7 2.97e+01 3.33e+01 -3.923971e+04 -1.525629e-07 1.379e-06 3.123e+02
GE8 2.97e+01 3.33e+01 -3.923971e+04 2.623354e-09 1.713e-06 3.190e+02
GE9 2.97e+01 3.33e+01 -3.923971e+04 -6.830619e-09 3.337e-07 3.508e+02
Updating EXX and rerun SCF 1.025e+03 (s)
GE1 2.97e+01 3.33e+01 -3.923971e+04 0.000000e+00 3.292e-05 2.773e+02
GE2 2.97e+01 3.33e+01 -3.923971e+04 1.766482e-06 2.947e-05 2.772e+02
GE3 2.97e+01 3.33e+01 -3.923971e+04 7.101357e-06 3.737e-05 2.806e+02
GE4 2.97e+01 3.33e+01 -3.923971e+04 -2.647007e-06 3.911e-05 2.818e+02
GE5 2.97e+01 3.33e+01 -3.923971e+04 -5.002272e-06 2.147e-05 2.842e+02
GE6 2.97e+01 3.33e+01 -3.923971e+04 -3.785116e-07 3.718e-06 2.861e+02
GE7 2.97e+01 3.33e+01 -3.923971e+04 -3.899145e-08 2.582e-06 2.852e+02
GE8 2.97e+01 3.33e+01 -3.923971e+04 -1.095126e-08 1.974e-06 2.816e+02
GE9 2.97e+01 3.33e+01 -3.923971e+04 -9.119867e-09 3.413e-07 3.264e+02
- some parameter will have effect
4.1. if I make
smearing_sigmalarger from 0.002 to 0.010, the SCF convergence performance will be worse 4.2. Making OMP parallel number larger (32->64, and the total core keep unchanged to 256) will lead to more SCF time in each EXX and GE steps, but making OMP parallel number smaller (32->16) will lead to OOM error. 4.3. Largermixing_ndimwill have little effect on performance but will lead to OOM error.
My next step will test HSE performance in
- Fe-O bulk systems
- FeCx surface systems
- FeCx surface systems which having some C1 molecular adsorbed.
Now I'm wandering:
- What's the best parameters for HSE usage in spin-polarized system (or, just in Fe-Cx and Fe-C-H-O system)
- Does LibRI have more improvement space in these spin-polarized and magnetic system ?
- What's the size limit for HSE calculation (within 5 days' calculation time) ? I've heard that VASP HSE can be done in surface which contain 50-60 atoms.
I may need more discussion and cooperate for HSE usage in these Fe-contained magnetic system @PeizeLin @WHUweiqingzhou @mohanchen
I noticed that the KPT is much larger that the selected criterion ka > 25A, so I'm doing test in a proper KPT
After normalizing KPT to KSPACING 0.14 in INPUT file, the calculation have done more reasonably, for Fe2C above by using OMP_NUM_THREADS=16 mpirun -np 16 abacus, the time cost for SCF is totally acceptable
GE20 2.63e+01 2.91e+01 -5.276221e+04 -7.251346e-08 1.316e-06 3.808e+00
GE21 2.63e+01 2.91e+01 -5.276221e+04 1.201546e-08 7.316e-07 3.935e+00
Updating EXX and rerun SCF 5.924e+02 (s)
GE1 2.76e+01 3.32e+01 -5.267478e+04 0.000000e+00 7.036e-02 7.883e+00
...
TIME STATISTICS
-------------------------------------------------------------------------------------
CLASS_NAME NAME TIME(Sec) CALLS AVG(Sec) PER(%)
-------------------------------------------------------------------------------------
total 75159.80 9 8351.09 100.00
Driver reading 0.07 1 0.07 0.00
Input Init 0.06 1 0.06 0.00
Input_Conv Convert 0.00 1 0.00 0.00
Driver driver_line 75159.73 1 75159.73 100.00
UnitCell check_tau 0.00 1 0.00 0.00
PW_Basis_Sup setuptransform 0.06 1 0.06 0.00
PW_Basis_Sup distributeg 0.00 1 0.00 0.00
mymath heapsort 0.00 3 0.00 0.00
PW_Basis_K setuptransform 0.01 1 0.01 0.00
PW_Basis_K distributeg 0.00 1 0.00 0.00
PW_Basis setup_struc_factor 0.01 1 0.01 0.00
NOrbital_Lm extra_uniform 29.94 22261 0.00 0.04
Mathzone_Add1 SplineD2 0.22 22261 0.00 0.00
Mathzone_Add1 Cubic_Spline_Interpolation 0.40 22261 0.00 0.00
Mathzone_Add1 Uni_Deriv_Phi 27.95 22261 0.00 0.04
Exx_LRI init 92.02 1 92.02 0.12
Matrix_Orbs21 init 9.36 2 4.68 0.01
ORB_gaunt_table init_Gaunt_CH 0.88 3 0.29 0.00
ORB_gaunt_table Calc_Gaunt_CH 0.44 408058 0.00 0.00
ORB_gaunt_table init_Gaunt 7.63 3 2.54 0.01
ORB_gaunt_table Get_Gaunt_SH 14.57 28208149 0.00 0.02
Matrix_Orbs21 init_radial 0.00 2 0.00 0.00
Matrix_Orbs21 init_radial_table 71.08 2 35.54 0.09
ORB_table_phi cal_ST_Phi12_R 44.64 43034 0.00 0.06
LRI_CV set_orbitals 61.00 1 61.00 0.08
Matrix_Orbs11 init 0.06 1 0.06 0.00
Matrix_Orbs11 init_radial 0.00 1 0.00 0.00
Matrix_Orbs11 init_radial_table 10.36 1 10.36 0.01
ppcell_vl init_vloc 0.01 1 0.01 0.00
Ions opt_ions 75066.34 1 75066.34 99.88
ESolver_KS_LCAO Run 28425.97 1 28425.97 37.82
ESolver_KS_LCAO beforescf 178.49 1 178.49 0.24
ESolver_KS_LCAO beforesolver 0.74 1 0.74 0.00
ESolver_KS_LCAO set_matrix_grid 0.64 1 0.64 0.00
atom_arrange search 0.00 1 0.00 0.00
Grid_Technique init 0.60 1 0.60 0.00
Grid_BigCell grid_expansion_index 0.00 2 0.00 0.00
Record_adj for_2d 0.03 1 0.03 0.00
Grid_Driver Find_atom 0.24 17400 0.00 0.00
LCAO_Hamilt grid_prepare 0.00 1 0.00 0.00
Veff initialize_HR 0.00 1 0.00 0.00
OverlapNew initialize_SR 0.00 1 0.00 0.00
EkineticNew initialize_HR 0.00 1 0.00 0.00
NonlocalNew initialize_HR 0.00 1 0.00 0.00
Charge set_rho_core 0.00 1 0.00 0.00
Charge atomic_rho 0.03 1 0.03 0.00
PW_Basis_Sup recip2real 1.92 3240 0.00 0.00
PW_Basis_Sup gathers_scatterp 0.69 3240 0.00 0.00
Potential init_pot 0.02 1 0.02 0.00
Potential update_from_charge 32.32 359 0.09 0.04
Potential cal_fixed_v 0.00 1 0.00 0.00
PotLocal cal_fixed_v 0.00 1 0.00 0.00
Potential cal_v_eff 32.29 359 0.09 0.04
H_Hartree_pw v_hartree 1.28 359 0.00 0.00
PW_Basis_Sup real2recip 2.88 3260 0.00 0.00
PW_Basis_Sup gatherp_scatters 1.35 3260 0.00 0.00
PotXC cal_v_eff 30.95 359 0.09 0.04
XC_Functional v_xc 14663.85 191 76.77 19.51
Potential interpolate_vrs 0.03 359 0.00 0.00
Exx_LRI cal_exx_ions 176.98 1 176.98 0.24
LRI_CV cal_datas 12.17 3 4.06 0.02
H_Ewald_pw compute_ewald 0.72 1 0.72 0.00
Charge_Mixing init_mixing 0.03 45 0.00 0.00
HSolverLCAO solve 2546.72 358 7.11 3.39
HamiltLCAO updateHk 1319.18 30072 0.04 1.76
OperatorLCAO init 134.75 118524 0.00 0.18
Veff contributeHR 133.19 716 0.19 0.18
Gint_interface cal_gint 154.59 1076 0.14 0.21
Gint_interface cal_gint_vlocal 104.96 716 0.15 0.14
Gint_Tools cal_psir_ylm 8.99 34368 0.00 0.01
Gint_k transfer_pvpR 28.22 716 0.04 0.04
OverlapNew calculate_SR 0.05 1 0.05 0.00
OverlapNew contributeHk 3.21 30072 0.00 0.00
EkineticNew contributeHR 0.08 716 0.00 0.00
EkineticNew calculate_HR 0.08 1 0.08 0.00
NonlocalNew contributeHR 0.69 716 0.00 0.00
NonlocalNew calculate_HR 0.29 1 0.29 0.00
OperatorLCAO contributeHk 6.12 30072 0.00 0.01
HSolverLCAO hamiltSolvePsiK 1074.60 30072 0.04 1.43
DiagoElpa elpa_solve 1059.26 30072 0.04 1.41
ElecStateLCAO psiToRho 152.86 358 0.43 0.20
elecstate cal_dm 35.04 359 0.10 0.05
psiMulPsiMpi pdgemm 34.28 30156 0.00 0.05
DensityMatrix cal_DMR 4.31 359 0.01 0.01
Local_Orbital_wfc wfc_2d_to_grid 44.45 33852 0.00 0.06
Gint transfer_DMR 14.18 358 0.04 0.02
Gint_interface cal_gint_rho 48.87 358 0.14 0.07
Charge_Mixing get_drho 0.04 358 0.00 0.00
Charge mix_rho 0.89 313 0.00 0.00
Charge Broyden_mixing 0.57 313 0.00 0.00
RI_2D_Comm split_m2D_ktoR 95.74 44 2.18 0.13
Exx_LRI cal_exx_elec 25560.41 44 580.92 34.01
RI_2D_Comm add_Hexx 1175.08 28308 0.04 1.56
XC_Functional v_xc_libxc 30.63 337 0.09 0.04
Exx_LRI write_Hexxs 0.57 1 0.57 0.00
ESolver_KS_LCAO out_deepks_labels 0.00 1 0.00 0.00
LCAO_Deepks_Interface out_deepks_labels 0.00 1 0.00 0.00
HamiltLCAO updateSk 0.01 84 0.00 0.00
Force_Stress_LCAO getForceStress 46640.37 1 46640.37 62.05
Forces cal_force_loc 0.01 1 0.01 0.00
Forces cal_force_ew 0.02 1 0.02 0.00
Forces cal_force_cc 0.00 1 0.00 0.00
Forces cal_force_scc 0.01 1 0.01 0.00
Stress_Func stress_loc 0.03 1 0.03 0.00
Stress_Func stress_har 0.00 1 0.00 0.00
Stress_Func stress_ewa 0.01 1 0.01 0.00
Stress_Func stress_cc 0.00 1 0.00 0.00
Stress_Func stress_gga 0.02 1 0.02 0.00
Force_LCAO_k ftable_k 1.75 1 1.75 0.00
Force_LCAO_k allocate_k 0.44 1 0.44 0.00
LCAO_gen_fixedH b_NL_mu_new 0.19 1 0.19 0.00
Force_LCAO_k cal_foverlap_k 0.15 1 0.15 0.00
Force_LCAO_k cal_edm_2d 0.14 1 0.14 0.00
DensityMatrix sum_DMR_spin 0.00 1 0.00 0.00
Force_LCAO_k cal_ftvnl_dphi_k 0.00 1 0.00 0.00
Force_LCAO_k cal_fvl_dphi_k 0.77 1 0.77 0.00
Gint_interface cal_gint_force 0.77 2 0.38 0.00
Gint_Tools cal_dpsir_ylm 0.23 64 0.00 0.00
Gint_Tools cal_dpsirr_ylm 0.10 64 0.00 0.00
Force_LCAO_k cal_fvnl_dbeta_k_new 0.40 1 0.40 0.00
Exx_LRI cal_exx_force 4521.58 1 4521.58 6.02
Exx_LRI cal_exx_stress 42116.88 1 42116.88 56.04
ModuleIO write_istate_info 0.05 1 0.05 0.00
-------------------------------------------------------------------------------------
START Time : Tue Mar 19 20:15:41 2024
FINISH Time : Wed Mar 20 17:08:22 2024
TOTAL Time : 75161
SEE INFORMATION IN : OUT.Fe2C-HSE/
But the time cost for Force_LCAO-k and cal_exx_stress is too large and need to be optimized.
The Force and Stress calculation in HSE can be separated, if I turn down the Stress calculation, the time cost will be much lower.
TIME STATISTICS
-------------------------------------------------------------------------------------
CLASS_NAME NAME TIME(Sec) CALLS AVG(Sec) PER(%)
-------------------------------------------------------------------------------------
total 33647.82 9 3738.65 100.00
Driver reading 0.11 1 0.11 0.00
Input Init 0.07 1 0.07 0.00
Input_Conv Convert 0.03 1 0.03 0.00
Driver driver_line 33647.71 1 33647.71 100.00
UnitCell check_tau 0.00 1 0.00 0.00
PW_Basis_Sup setuptransform 0.01 1 0.01 0.00
PW_Basis_Sup distributeg 0.00 1 0.00 0.00
mymath heapsort 0.00 3 0.00 0.00
PW_Basis_K setuptransform 0.02 1 0.02 0.00
PW_Basis_K distributeg 0.01 1 0.01 0.00
PW_Basis setup_struc_factor 0.01 1 0.01 0.00
NOrbital_Lm extra_uniform 30.20 22261 0.00 0.09
Mathzone_Add1 SplineD2 0.22 22261 0.00 0.00
Mathzone_Add1 Cubic_Spline_Interpolation 0.47 22261 0.00 0.00
Mathzone_Add1 Uni_Deriv_Phi 28.16 22261 0.00 0.08
Exx_LRI init 94.32 1 94.32 0.28
Matrix_Orbs21 init 10.95 2 5.48 0.03
ORB_gaunt_table init_Gaunt_CH 0.96 3 0.32 0.00
ORB_gaunt_table Calc_Gaunt_CH 0.48 408058 0.00 0.00
ORB_gaunt_table init_Gaunt 9.03 3 3.01 0.03
ORB_gaunt_table Get_Gaunt_SH 15.40 28208149 0.00 0.05
Matrix_Orbs21 init_radial 0.00 2 0.00 0.00
Matrix_Orbs21 init_radial_table 71.43 2 35.71 0.21
ORB_table_phi cal_ST_Phi12_R 44.92 43034 0.00 0.13
LRI_CV set_orbitals 62.43 1 62.43 0.19
Matrix_Orbs11 init 0.07 1 0.07 0.00
Matrix_Orbs11 init_radial 0.00 1 0.00 0.00
Matrix_Orbs11 init_radial_table 10.46 1 10.46 0.03
ppcell_vl init_vloc 0.01 1 0.01 0.00
Ions opt_ions 33551.43 1 33551.43 99.71
ESolver_KS_LCAO Run 28994.91 1 28994.91 86.17
ESolver_KS_LCAO beforescf 186.88 1 186.88 0.56
ESolver_KS_LCAO beforesolver 0.20 1 0.20 0.00
ESolver_KS_LCAO set_matrix_grid 0.10 1 0.10 0.00
atom_arrange search 0.00 1 0.00 0.00
Grid_Technique init 0.07 1 0.07 0.00
Grid_BigCell grid_expansion_index 0.00 2 0.00 0.00
Record_adj for_2d 0.03 1 0.03 0.00
Grid_Driver Find_atom 0.22 17400 0.00 0.00
LCAO_Hamilt grid_prepare 0.00 1 0.00 0.00
Veff initialize_HR 0.00 1 0.00 0.00
OverlapNew initialize_SR 0.00 1 0.00 0.00
EkineticNew initialize_HR 0.00 1 0.00 0.00
NonlocalNew initialize_HR 0.00 1 0.00 0.00
Exx_LRI cal_exx_ions 185.47 1 185.47 0.55
LRI_CV cal_datas 12.49 3 4.16 0.04
Charge set_rho_core 0.00 1 0.00 0.00
Charge atomic_rho 1.18 1 1.18 0.00
PW_Basis_Sup recip2real 3.29 3234 0.00 0.01
PW_Basis_Sup gathers_scatterp 2.14 3234 0.00 0.01
Potential init_pot 0.03 1 0.03 0.00
Potential update_from_charge 34.05 359 0.09 0.10
Potential cal_fixed_v 0.00 1 0.00 0.00
PotLocal cal_fixed_v 0.00 1 0.00 0.00
Potential cal_v_eff 34.03 359 0.09 0.10
H_Hartree_pw v_hartree 1.72 359 0.00 0.01
PW_Basis_Sup real2recip 3.80 3255 0.00 0.01
PW_Basis_Sup gatherp_scatters 2.56 3255 0.00 0.01
PotXC cal_v_eff 32.25 359 0.09 0.10
XC_Functional v_xc 14926.25 191 78.15 44.36
Potential interpolate_vrs 0.02 359 0.00 0.00
H_Ewald_pw compute_ewald 0.00 1 0.00 0.00
Charge_Mixing init_mixing 0.04 45 0.00 0.00
HSolverLCAO solve 2960.28 358 8.27 8.80
HamiltLCAO updateHk 1395.76 30072 0.05 4.15
OperatorLCAO init 140.79 120288 0.00 0.42
Veff contributeHR 139.42 716 0.19 0.41
Gint_interface cal_gint 158.20 1076 0.15 0.47
Gint_interface cal_gint_vlocal 106.82 716 0.15 0.32
Gint_Tools cal_psir_ylm 8.88 34368 0.00 0.03
Gint_k transfer_pvpR 32.60 716 0.05 0.10
OverlapNew calculate_SR 0.05 1 0.05 0.00
OverlapNew contributeHk 3.45 30072 0.00 0.01
EkineticNew contributeHR 0.08 716 0.00 0.00
EkineticNew calculate_HR 0.08 1 0.08 0.00
NonlocalNew contributeHR 0.69 716 0.00 0.00
NonlocalNew calculate_HR 0.29 1 0.29 0.00
OperatorLCAO contributeHk 8.64 30072 0.00 0.03
HSolverLCAO hamiltSolvePsiK 1403.75 30072 0.05 4.17
DiagoElpa elpa_solve 1382.46 30072 0.05 4.11
ElecStateLCAO psiToRho 160.68 358 0.45 0.48
elecstate cal_dm 34.69 359 0.10 0.10
psiMulPsiMpi pdgemm 33.93 30156 0.00 0.10
DensityMatrix cal_DMR 4.31 359 0.01 0.01
Local_Orbital_wfc wfc_2d_to_grid 45.35 33852 0.00 0.13
Gint transfer_DMR 15.27 358 0.04 0.05
Gint_interface cal_gint_rho 50.83 358 0.14 0.15
Charge_Mixing get_drho 0.07 358 0.00 0.00
Charge mix_rho 2.18 313 0.01 0.01
Charge Broyden_mixing 1.86 313 0.01 0.01
RI_2D_Comm split_m2D_ktoR 102.40 44 2.33 0.30
Exx_LRI cal_exx_elec 25445.02 44 578.30 75.62
RI_2D_Comm add_Hexx 1429.29 32004 0.04 4.25
XC_Functional v_xc_libxc 31.90 337 0.09 0.09
Exx_LRI write_Hexxs 0.71 1 0.71 0.00
ESolver_KS_LCAO out_deepks_labels 0.00 1 0.00 0.00
LCAO_Deepks_Interface out_deepks_labels 0.00 1 0.00 0.00
HamiltLCAO updateSk 0.01 84 0.00 0.00
Force_Stress_LCAO getForceStress 4556.51 1 4556.51 13.54
Forces cal_force_loc 0.00 1 0.00 0.00
Forces cal_force_ew 0.00 1 0.00 0.00
Forces cal_force_cc 0.00 1 0.00 0.00
Forces cal_force_scc 0.01 1 0.01 0.00
Force_LCAO_k ftable_k 1.46 1 1.46 0.00
Force_LCAO_k allocate_k 0.42 1 0.42 0.00
LCAO_gen_fixedH b_NL_mu_new 0.20 1 0.20 0.00
Force_LCAO_k cal_foverlap_k 0.17 1 0.17 0.00
Force_LCAO_k cal_edm_2d 0.16 1 0.16 0.00
DensityMatrix sum_DMR_spin 0.00 1 0.00 0.00
Force_LCAO_k cal_ftvnl_dphi_k 0.00 1 0.00 0.00
Force_LCAO_k cal_fvl_dphi_k 0.55 1 0.55 0.00
Gint_interface cal_gint_force 0.55 2 0.28 0.00
Gint_Tools cal_dpsir_ylm 0.22 64 0.00 0.00
Force_LCAO_k cal_fvnl_dbeta_k_new 0.20 1 0.20 0.00
Exx_LRI cal_exx_force 4554.97 1 4554.97 13.54
ModuleIO write_istate_info 0.22 1 0.22 0.00
-------------------------------------------------------------------------------------
START Time : Wed Mar 20 19:23:23 2024
FINISH Time : Thu Mar 21 04:44:11 2024
TOTAL Time : 33648
SEE INFORMATION IN : OUT.Fe2C-HSE
Then the time cost for HSE SCF can be accepted
But ,I still wonder can we use wavefunction extrapolation in HSE AIMD/Opt calculation? for the separated loop algorism will always do PBE first
But ,I still wonder can we use wavefunction extrapolation in HSE AIMD/Opt calculation? for the separated loop algorism will always do PBE first
There are only chg_extrap in ABACUS, which cannot be used by HSE. I wonder another method by ase-abacus which use restart_save and restart_load
@PeizeLin
For Fe3C system above, the calculation can be done but the SCF performance is poor (the 1st DRHO are keeping in (6-8)*e^{-6} and CANNOT reach 1e-6 thr, and, if use the default exx parameter, the convergence will be harder). And, it seems the calculation was done NOT by converge separate loop to scf_thr 1e-6.
Updating EXX and rerun SCF 6.049e+02 (s)
GE1 2.97e+01 3.33e+01 -3.923971e+04 0.000000e+00 6.828e-06 1.784e+01
GE2 2.97e+01 3.33e+01 -3.923971e+04 9.753741e-07 6.848e-06 1.774e+01
GE3 2.97e+01 3.33e+01 -3.923971e+04 -7.646272e-07 8.249e-06 1.780e+01
GE4 2.97e+01 3.33e+01 -3.923971e+04 -1.425273e-07 3.995e-06 1.775e+01
GE5 2.97e+01 3.33e+01 -3.923971e+04 -1.786850e-08 5.548e-06 1.774e+01
GE6 2.97e+01 3.33e+01 -3.923971e+04 -8.728838e-08 1.483e-06 1.777e+01
GE7 2.97e+01 3.33e+01 -3.923971e+04 5.073467e-09 6.515e-07 1.792e+01
----------------------------------------------------------------
TOTAL-STRESS (KBAR)
----------------------------------------------------------------
148.9112096763 0.0072703256 -0.1701243744
0.0070003380 91.9771779461 -0.8375153913
-0.1703347058 -0.8376343605 19.3338642150
----------------------------------------------------------------
TOTAL-PRESSURE: 86.740751 KBAR
TIME STATISTICS
--------------------------------------------------------------------------------------
CLASS_NAME NAME TIME(Sec) CALLS AVG(Sec) PER(%)
--------------------------------------------------------------------------------------
total 104645.52 9 11627.28 100.00
Driver reading 0.03 1 0.03 0.00
Input Init 0.03 1 0.03 0.00
Input_Conv Convert 0.00 1 0.00 0.00
Driver driver_line 104645.49 1 104645.49 100.00
UnitCell check_tau 0.00 1 0.00 0.00
PW_Basis_Sup setuptransform 0.01 1 0.01 0.00
PW_Basis_Sup distributeg 0.00 1 0.00 0.00
mymath heapsort 0.00 3 0.00 0.00
PW_Basis_K setuptransform 0.01 1 0.01 0.00
PW_Basis_K distributeg 0.00 1 0.00 0.00
PW_Basis setup_struc_factor 0.01 1 0.01 0.00
NOrbital_Lm extra_uniform 27.40 22261 0.00 0.03
Mathzone_Add1 SplineD2 0.22 22261 0.00 0.00
Mathzone_Add1 Cubic_Spline_Interpolation 0.43 22261 0.00 0.00
Mathzone_Add1 Uni_Deriv_Phi 25.60 22261 0.00 0.02
Exx_LRI init 67.52 1 67.52 0.06
Matrix_Orbs21 init 9.46 2 4.73 0.01
ORB_gaunt_table init_Gaunt_CH 0.88 3 0.29 0.00
ORB_gaunt_table Calc_Gaunt_CH 0.44 408058 0.00 0.00
ORB_gaunt_table init_Gaunt 7.75 3 2.58 0.01
ORB_gaunt_table Get_Gaunt_SH 9.56 28208149 0.00 0.01
Matrix_Orbs21 init_radial 0.00 2 0.00 0.00
Matrix_Orbs21 init_radial_table 51.59 2 25.79 0.05
ORB_table_phi cal_ST_Phi12_R 24.79 43034 0.00 0.02
LRI_CV set_orbitals 39.42 1 39.42 0.04
Matrix_Orbs11 init 0.06 1 0.06 0.00
Matrix_Orbs11 init_radial 0.00 1 0.00 0.00
Matrix_Orbs11 init_radial_table 5.29 1 5.29 0.01
ppcell_vl init_vloc 0.01 1 0.01 0.00
Ions opt_ions 104576.84 1 104576.84 99.93
ESolver_KS_LCAO Run 76530.48 1 76530.48 73.13
ESolver_KS_LCAO beforescf 132.43 1 132.43 0.13
ESolver_KS_LCAO beforesolver 0.19 1 0.19 0.00
ESolver_KS_LCAO set_matrix_grid 0.11 1 0.11 0.00
atom_arrange search 0.00 1 0.00 0.00
Grid_Technique init 0.09 1 0.09 0.00
Grid_BigCell grid_expansion_index 0.00 2 0.00 0.00
Record_adj for_2d 0.02 1 0.02 0.00
Grid_Driver Find_atom 0.52 29264 0.00 0.00
LCAO_Hamilt grid_prepare 0.00 1 0.00 0.00
Veff initialize_HR 0.00 1 0.00 0.00
OverlapNew initialize_SR 0.00 1 0.00 0.00
EkineticNew initialize_HR 0.00 1 0.00 0.00
NonlocalNew initialize_HR 0.01 1 0.01 0.00
Charge set_rho_core 0.00 1 0.00 0.00
Charge atomic_rho 0.05 1 0.05 0.00
PW_Basis_Sup recip2real 4.08 8208 0.00 0.00
PW_Basis_Sup gathers_scatterp 1.60 8208 0.00 0.00
Potential init_pot 0.04 1 0.04 0.00
Potential update_from_charge 95.42 911 0.10 0.09
Potential cal_fixed_v 0.00 1 0.00 0.00
PotLocal cal_fixed_v 0.00 1 0.00 0.00
Potential cal_v_eff 95.35 911 0.10 0.09
H_Hartree_pw v_hartree 2.24 911 0.00 0.00
PW_Basis_Sup real2recip 4.76 8235 0.00 0.00
PW_Basis_Sup gatherp_scatters 2.52 8235 0.00 0.00
PotXC cal_v_eff 92.93 911 0.10 0.09
XC_Functional v_xc 39320.76 470 83.66 37.58
Potential interpolate_vrs 0.07 911 0.00 0.00
Exx_LRI cal_exx_ions 131.73 1 131.73 0.13
LRI_CV cal_datas 8.26 3 2.75 0.01
H_Ewald_pw compute_ewald 0.42 1 0.42 0.00
Charge_Mixing init_mixing 0.10 101 0.00 0.00
HSolverLCAO solve 15672.73 910 17.22 14.98
HamiltLCAO updateHk 9583.95 112840 0.08 9.16
OperatorLCAO init 275.57 447888 0.00 0.26
Veff contributeHR 272.48 1820 0.15 0.26
Gint_interface cal_gint 320.82 2732 0.12 0.31
Gint_interface cal_gint_vlocal 239.64 1820 0.13 0.23
Gint_Tools cal_psir_ylm 12.80 65520 0.00 0.01
Gint_k transfer_pvpR 32.83 1820 0.02 0.03
OverlapNew calculate_SR 0.02 1 0.02 0.00
OverlapNew contributeHk 10.87 112840 0.00 0.01
EkineticNew contributeHR 0.04 1820 0.00 0.00
EkineticNew calculate_HR 0.04 1 0.04 0.00
NonlocalNew contributeHR 1.43 1820 0.00 0.00
NonlocalNew calculate_HR 0.20 1 0.20 0.00
OperatorLCAO contributeHk 26.87 112840 0.00 0.03
HSolverLCAO hamiltSolvePsiK 5435.97 112840 0.05 5.19
DiagoElpa elpa_solve 5276.76 112840 0.05 5.04
ElecStateLCAO psiToRho 652.46 910 0.72 0.62
elecstate cal_dm 456.91 911 0.50 0.44
psiMulPsiMpi pdgemm 454.76 112964 0.00 0.43
DensityMatrix cal_DMR 10.47 911 0.01 0.01
Local_Orbital_wfc wfc_2d_to_grid 66.30 125364 0.00 0.06
Gint transfer_DMR 18.41 910 0.02 0.02
Gint_interface cal_gint_rho 80.75 910 0.09 0.08
Charge_Mixing get_drho 0.05 910 0.00 0.00
Charge mix_rho 2.00 809 0.00 0.00
Charge Broyden_mixing 1.22 809 0.00 0.00
RI_2D_Comm split_m2D_ktoR 369.52 100 3.70 0.35
Exx_LRI cal_exx_elec 60234.83 100 602.35 57.56
RI_2D_Comm add_Hexx 9270.09 109368 0.08 8.86
XC_Functional v_xc_libxc 92.52 882 0.10 0.09
Exx_LRI write_Hexxs 1.62 1 1.62 0.00
ESolver_KS_LCAO out_deepks_labels 0.00 1 0.00 0.00
LCAO_Deepks_Interface out_deepks_labels 0.00 1 0.00 0.00
HamiltLCAO updateSk 0.01 124 0.00 0.00
Force_Stress_LCAO getForceStress 28046.31 1 28046.31 26.80
Forces cal_force_loc 0.00 1 0.00 0.00
Forces cal_force_ew 0.00 1 0.00 0.00
Forces cal_force_cc 0.00 1 0.00 0.00
Forces cal_force_scc 0.01 1 0.01 0.00
Stress_Func stress_loc 0.02 1 0.02 0.00
Stress_Func stress_har 0.00 1 0.00 0.00
Stress_Func stress_ewa 0.00 1 0.00 0.00
Stress_Func stress_cc 0.00 1 0.00 0.00
Stress_Func stress_gga 0.01 1 0.01 0.00
Force_LCAO_k ftable_k 1.49 1 1.49 0.00
Force_LCAO_k allocate_k 0.25 1 0.25 0.00
LCAO_gen_fixedH b_NL_mu_new 0.12 1 0.12 0.00
Force_LCAO_k cal_foverlap_k 0.55 1 0.55 0.00
Force_LCAO_k cal_edm_2d 0.54 1 0.54 0.00
DensityMatrix sum_DMR_spin 0.00 1 0.00 0.00
Force_LCAO_k cal_ftvnl_dphi_k 0.00 1 0.00 0.00
Force_LCAO_k cal_fvl_dphi_k 0.42 1 0.42 0.00
Gint_interface cal_gint_force 0.42 2 0.21 0.00
Gint_Tools cal_dpsir_ylm 0.15 48 0.00 0.00
Gint_Tools cal_dpsirr_ylm 0.07 48 0.00 0.00
Force_LCAO_k cal_fvnl_dbeta_k_new 0.25 1 0.25 0.00
Exx_LRI cal_exx_force 4381.26 1 4381.26 4.19
Exx_LRI cal_exx_stress 23663.47 1 23663.47 22.61
ModuleIO write_istate_info 0.07 1 0.07 0.00
--------------------------------------------------------------------------------------
START Time : Tue Mar 19 19:09:13 2024
FINISH Time : Thu Mar 21 00:13:18 2024
TOTAL Time : 104645
SEE INFORMATION IN : OUT.Fe3C-HSE/
I wonder:
- why this occasion occur? Did there some limits in EXX calculation?
- Can user set a stopping point for HSE calculation? for example in the HSE above, one will consider that EDIFF=1e-6 for the second SCF(out of EXX) is a good converge point and want HSE SCF end there
After scf_ene_thr added, there seems to be some way to stop EXX calculation early.
But HSE convergence in magnetic system is still a problem
I'll keep an eye on it.