abacus-develop icon indicating copy to clipboard operation
abacus-develop copied to clipboard

RT-TDDFT restart error

Open 1041176461 opened this issue 1 year ago • 6 comments

Describe the bug

When running RT-TDDFT with md_restart 1 and init_wfc file for restarting a job, a warning is reported image However, there are wfc files with different names under OUT* directory: image

Expected behavior

No response

To Reproduce

No response

Environment

No response

Additional Context

No response

Task list for Issue attackers (only for developers)

  • [ ] Verify the issue is not a duplicate.
  • [ ] Describe the bug.
  • [ ] Steps to reproduce.
  • [ ] Expected behavior.
  • [ ] Error message.
  • [ ] Environment details.
  • [ ] Additional context.
  • [ ] Assign a priority level (low, medium, high, urgent).
  • [ ] Assign the issue to a team member.
  • [ ] Label the issue with relevant tags.
  • [ ] Identify possible related issues.
  • [ ] Create a unit test or automated test to reproduce the bug (if applicable).
  • [ ] Fix the bug.
  • [ ] Test the fix.
  • [ ] Update documentation (if necessary).
  • [ ] Close the issue and inform the reporter (if applicable).

1041176461 avatar Jul 19 '24 02:07 1041176461

The wfc file in OUT* directory is from the job with out_app_flag = false, which will generate some WFC files with filename containing ION step information. Now, ABACUS do not support init_wfc from the files generated with out_app_flag = false, because ABACUS do not know which ION step should be read. To support this case, we need to add an INPUT parameter to let ABACUS know the ION step of WFC file.

pxlxingliang avatar Jul 19 '24 07:07 pxlxingliang

As far as I know, TDDFT needs to read the time-evolved wave function to restart the task. I don't know how it is implemented in the program, but reading an append file is definitely unrealistic. Maybe @AsTonyshment and @lyb9812 are more familiar with it.

1041176461 avatar Jul 19 '24 10:07 1041176461

@1041176461 In my impression, for TDDFT calculations, if a job restart is needed, you should set out_wfc_lcao=1 during the calculation, and then you can continue the calculation by setting md_restart=1. There should be no need to set the init_wfc parameter. However, as far as I know, there seem to be some issues with reading in the electric field during restart. It might be necessary to reset the electric field from the restart point, otherwise the efield_*.dat file might encounter problems. This was my experience from using it three months ago, so I'm not sure if the restart logic has been updated since then...

AsTonyshment avatar Jul 19 '24 10:07 AsTonyshment

  1. I also did not set the restart with init_wfc turned on before, but the last step before the restart and the first step after the restart had inconsistent self-consistent energy. I turned off the electric field when restarting here because the electric field had ended before the restart https://github.com/deepmodeling/abacus-develop/issues/4617#issuecomment-2219302735

  2. If init_wfc does not need to be set, why do we need to set out_wfc_lcao? Will md_restart read the wave function?

  3. out_wfc_lcao=1 with out_app_flag = 0 will generate a huge file for md, I think reading this file will become very slow

1041176461 avatar Jul 20 '24 01:07 1041176461

The restart logic was written by @lyb9812. Maybe you can explain about this?

AsTonyshment avatar Jul 20 '24 01:07 AsTonyshment

  1. I also did not set the restart with init_wfc turned on before, but the last step before the restart and the first step after the restart had inconsistent self-consistent energy. I turned off the electric field when restarting here because the electric field had ended before the restart RT-TDDFT abnormally interrupted #4617 (comment)
  2. If init_wfc does not need to be set, why do we need to set out_wfc_lcao? Will md_restart read the wave function?
  3. out_wfc_lcao=1 with out_app_flag = 0 will generate a huge file for md, I think reading this file will become very slow

Both init_wfc and md_restart need to be set. You can set out_interval to reduce the output. The velocity gauge may not support the restart function. It hasn't been tested yet.

lyb9812 avatar Jul 24 '24 17:07 lyb9812

It's not a good idea to print out the wave functions at each TDDFT step, the files will be too large.

mohanchen avatar Nov 23 '24 04:11 mohanchen

After a thorough investigation, I have summarized the following restart calculation scheme suitable for RT-TDDFT:

  1. After the calculation is interrupted, it is best not to restart the calculation in the original directory. Instead, create a new independent directory specifically for the restart calculation.

  2. The restart calculation for RT-TDDFT requires the following three files: a. Restart_md.dat, used to read the interrupted MD step; b. STRU_MD_*, to obtain the ionic configuration (positions, velocities, etc.) at the interruption point; c. WFC_NAO_K*.txt, to obtain the wavefunction coefficient file at the interruption point.

Attention: It is important to note that the restart function seems to have been designed with only out_app_flag=1 in mind. Therefore, at this stage, the restart function cannot read wavefunction files with names like WFC_NAO_K*_ION*.txt. In the current version of ABACUS, out_app_flag=1 will cause the contents of WFC_NAO_K*.txt to be overwritten at each ionic step. Thus, if you want to obtain wavefunction information at each ionic step during the calculation using out_app_flag=0, you should remove the suffix from the corresponding WFC_NAO_K*_ION*.txt file and rename it to WFC_NAO_K*.txt before restarting.

It is worth noting that since the step numbering in STRU_MD_* starts from 0, while the step numbering in WFC_NAO_K*_ION*.txt starts from 1, you need to ensure that the ionic configuration file and the wavefunction file used for restart match in terms of step numbering. For example, STRU_MD_10 should be paired with WFC_NAO_K*_ION11.txt for the restart. If you find this cumbersome, you can set md_restartfreq (which controls the output frequency of configuration files) and out_interval (which controls the output frequency of wavefunction files) to the same value. This way, you only need to use the last output configuration file and wavefunction file for the restart, as they are saved with the same step interval.

  1. For the restart, it is suggested to set the INPUT file as follows:
md_restart         1
md_restartfreq     5 # any output interval you like
out_interval       5 # suggest keeping consistent with md_restartfreq
out_app_flag       0
read_file_dir      ./restart
init_wfc           file
out_wfc_lcao       1

Here, out_wfc_lcao=1 ensures the output of wavefunction coefficient files, md_restart=1 indicates that this is a restart calculation, init_wfc=file means reading the wavefunction from a file, and read_file_dir is the directory where the aforementioned Restart_md.dat, STRU_MD_*, and WFC_NAO_K*.txt files are stored. If out_app_flag=0, you will also need to manually remove the suffix from the WFC_NAO_K*_ION*.txt file and rename it to WFC_NAO_K*.txt.

  1. After PR #5877, the electric field parameters no longer need to be modified during the restart. The program will automatically read the step at which the calculation was interrupted and apply the electric field starting from the restart MD step. As shown in the tests, the energy remains consistent before and after the restart (and the dipole moment is also consistent, although it is not plotted).

Image

Below is a comparison of the output log files. On the left is the complete simulation, and on the right is the calculation restarted at the 10th MD step. It can be seen that their energy and dipole values are consistent. There are slight differences in the dipole values beyond the decimal point, which is due to a difference in the implementation of the restart logic: in the first step, the program behaves slightly differently from a normal calculation. After reading the WFC file, the program neither evolves the wavefunctions nor solves the eigenvalue equation. Instead, it skips module_tddft::Evolve_elec::solve_psi and performs charge mixing and SCF iterations to attempt to restore the system state at the interruption point (see module_esolver/esolver_ks_lcao_tddft.cpp, or PR #2628). The reason it cannot evolve is that we lack information about the wavefunctions from the previous time step, which is a compromise required for integrating RT-TDDFT into the overall MD framework.

Image

AsTonyshment avatar Jan 23 '25 07:01 AsTonyshment