Calculations in 00.scf do not converge, resulting in failure to continue training. raise RuntimeError("No system is avaliable") RuntimeError: No system is avaliable
I am using deepks for multi-label training. In the log.data in the 00.scf for each iter, the computation gradually does not converge, eventually leading to not being able to continue training. In iter.02, there is no configuration in 00.scf that can converge the calculation. It causes an error in 01.train of iter.02. Any suggestions for setting and tuning the training parameters?
This is the error message.
data_train/group.00 no system.raw, infer meta from data
data_train/group.00 reset batch size to 0
ignore empty dataset: data_train/group.00
Traceback (most recent call last):
File "
This is the log.data of iter.init, iter.00,iter.01,iter.02.
Have you ever tried to follow the setting suggested by the official doc
Have you ever tried to follow the setting suggested by the official doc
No, I have not followed the setting before. I'm testing based on the setting suggested by the official doc now and waiting for the results.
Have you ever tried to follow the setting suggested by the official doc
After following the official doc settings in init_train, the following error occurs in 01.train. How to solve this error to start training?
data_train/group.00 no system.raw, infer meta from data
data_test/group.01 no system.raw, infer meta from data
Traceback (most recent call last):
File "
Have you ever tried to follow the setting suggested by the official doc
After setting this, the convergence rate improves in the first few rounds of training, but at iter.03, the convergence rate decreases to 0. Is there any experience or advice on this situation please? Here is the log.data of the training, iter.init, iter.00,iter.01,iter.02,iter.03 respectively.
iter.init
iter.00
iter.01
iter.02
iter.03
@yycx1111 Maybe you can try the following suggestions in order:
- Try the latest code of deepks-kit. Lately we have fixed a bug for band gap label.
- Change the factors of force, stress and bandgap to some lower values. I notice that after adding these labels into training, the energy error increases quit a lot.
- Change the params in scf_abacus.yaml. Like smaller mixing_beta or larger scf_nmax etc.
- Change the start_lr in params.yaml to lower values in iter.init and after.
@yycx1111 Maybe you can try the following suggestions in order:
- Try the latest code of deepks-kit. Lately we have fixed a bug for band gap label.
- Change the factors of force, stress and bandgap to some lower values. I notice that after adding these labels into training, the energy error increases quit a lot.
- Change the params in scf_abacus.yaml. Like smaller mixing_beta or larger scf_nmax etc.
- Change the start_lr in params.yaml to lower values in iter.init and after.
Thank you for your advice. I will keep trying as you suggested. Is this deepks the latest version please? deepks 0.2.dev338+gbf7175b pypi_0 pypi
@yycx1111 Maybe you can try the following suggestions in order:
- Try the latest code of deepks-kit. Lately we have fixed a bug for band gap label.
- Change the factors of force, stress and bandgap to some lower values. I notice that after adding these labels into training, the energy error increases quit a lot.
- Change the params in scf_abacus.yaml. Like smaller mixing_beta or larger scf_nmax etc.
- Change the start_lr in params.yaml to lower values in iter.init and after.
Thank you for your advice. I will keep trying as you suggested. Is this deepks the latest version please? deepks 0.2.dev338+gbf7175b pypi_0 pypi
Oh! The bug is fixed in branch deveop of https://github.com/MCresearch/DeePKS-L/tree/develop. We update the code in that repo recently.
@yycx1111 Maybe you can try the following suggestions in order:
- Try the latest code of deepks-kit. Lately we have fixed a bug for band gap label.
- Change the factors of force, stress and bandgap to some lower values. I notice that after adding these labels into training, the energy error increases quit a lot.
- Change the params in scf_abacus.yaml. Like smaller mixing_beta or larger scf_nmax etc.
- Change the start_lr in params.yaml to lower values in iter.init and after.
Thank you for your advice. I will keep trying as you suggested. Is this deepks the latest version please? deepks 0.2.dev338+gbf7175b pypi_0 pypi
Oh! The bug is fixed in branch deveop of https://github.com/MCresearch/DeePKS-L/tree/develop. We update the code in that repo recently.
After updating the code, the convergence rate of scf calculation for iter.init, iter.00, iter.01 are able to reach 1. But after that the scf situation of iter is still not good. The current parameter tuning has not yet achieved good results. When analysing the output file of 01.train, there are some questions.
- “test.out” file of real_ene and pred_ene refers to which two energy difference between them, why the real_ene is different for different iter?
- Is it possible to judge the 00.scf situation of the next round of iter based on the output file of 01.train, because the scf calculation takes a long time. It takes a lot of time to adjust the parameters and then scf calculation. Any suggestion for adjusting parameters?Is there any basis to refer to for adjusting the parameters?