Default procedures for optimizations
I think the whole MP workflow can use a few tune-ups. For structure relaxations, instead of the current dumb "double-relaxation" that custodian does, I suggest the following recipe.
- Start with looser kpt grid, e.g., 0.5 of normal grid.
- Second run, do the actual grid we want + EDIFFG = -0.05. This last criteria is not used in MPVaspInputSet right now, but is actually critical to get good structures.
- Check CONTCAR volume - POSCAR volume. If volume change > 2%, repeat step 2.
I don't have time to do this myself, and I am a bit confused about where each of the many RunVasp do in the actual grand scheme of things. So I suggest someone else take the lead. Actually, the full_opt_run in Custodian already does this without the EDIFFG part. I will push something shortly which will enable this.
Two questions:
- Why does it help to start with a looser k-point grid? I understand that the first calc will take less time. But then won't the second calc take more time since the solution won't start off in a state that is as well converged?
- I've suggested force convergence (negative EDIFF) in the past, but Kristin (and maybe Geoffroy?) has told me that it is not completely reliable, i.e. does not always converge. Thus I have been reluctant to make force convergence the default strategy for high-throughput.
Note that full_opt_run is already supported in the same way as the double relaxation of RunVaspCustodian. Just check the docs of RunVaspCustodian. For example, if you want the band structures to use full_opt_run, just switch the job_type this one line of code:
matmethods/vasp/workflows/automatic/standard.py:23
wf = use_custodian(wf, fw_name_constraint="structure optimization",
custodian_params={"job_type": "double_relaxation_run",
"max_force_threshold": 0.25})
It helps for the first relaxation because the first geometry step is usually the one that takes the most steps. For example, let's say the first relaxation is 50 steps, and the second relaxation is 3 steps. By halving the kp, you may halve or even 1/4 the time for the first relaxation. Even if the second relaxation takes a few more steps, you are ahead. In general, geometry relaxations aren't all that sensitive to the kpt grid, and the first relaxation's data is completely useless except from the standpoint of getting an initial geometry guess.
Let me also state that this is the default strategy if you want HSE calculations done in any reasonable amount of time. Even for small-ish structures, going to something like 2x2x2 or 4x4x4 is painfully slow with HSE. But just a gamma 1x1x1 convergence first helps converge the lattice params and positions to some reasonable position before you move on to the expensive geom step.
As for force convergence, I am not sure what the advice of KP and Geoffroy is based on. As far as I know, it is basically a requirement if you need very accurate structures, e.g., as starting points for NEB calculations. I am not suggesting we impose EDIFFG arbitrarily from the first step. Just the second step and only -0.05, which is still not too tight. E.g., NEB and elastic constants you typically want -0.02 or even smaller.
In general, I suggest that full_opt_run be the default for all future calculations rather than double relax.
Hi Shyue
I agree that the algorithm that you sketched out should give more accurate results and is overall better than arbitrarily picking 2 relaxations. However I would still like to investigate the convergence (failure rate) concern before making it the default scheme.
Here is my proposed plan of action:
- I am (anyway) going to create a test set of about 25 workflows that is used to test changes to MatMethods parameters (~1 week timeframe). In this test set I will try to add a few simple example of known problematic systems for force convergence (simple systems, e.g. unary metals or simple binaries, not some complex crazy thing)
- Initially, I will run the tests to confirm the behavior of MatMethods as written.
- Next, I will switch to the force convergence criterion you outlined. We will compare the failure rate with the default parameters and then make a decision about which parameter set to use as default.
I am optimistic that once the data is there, we will agree on which is the best default parameter set. i.e. if there is little to no difference than I am all for switching.
That's fine. I will wait to see the results. BTW, I have implemented a half_kpts in the relaxation optinos in custodian. So all you need to do for step 3 is to turn on ediffg and the half kpts options.
Let me chime in that running a force convergence with EDIFFG -0.05 is pretty much necessary before a DFPT runs where I need strict convergence for the piezo tensor. My failure rate goes from ~ 50% to 0% on the DFPT doing this. Granted, this is not on simple systems but complex perovskites.
Can we update the default procedure in atomate to this? It's also important to get rid of the max force_threshhold handler as it causes lots of problems on the first relax when run in shyuep's half k-point/force relax method. The overall failure rate is improved with this methodology for me, and it reduces the failure rate for derivative calculations such as the DFPT for dielectric and piezo.
@shyamd @shyuep We revisited this at today's MP meeting
@shyamd is going to talk to Kristin to get her opinion on what MP wants to do. I am fine with switching the default relaxation procedures in atomate the Shyue's new scheme if it's what MP also wants to do by default. We should keep an option for the old scheme though, especially bc in some cases we don't care about very careful relaxations and just want a rough answer
Also,
-
@shyuep Have you been running a lot of calculations with ediffg=-0.05 and half_kpoints_first in your group over the past year? If so, what's been your experience?
-
@shyamd, the code changes required for atomate seem simple:
- Set ediffg as a default=-0.05 instead of None in OptimizeFW and RunVaspCustodian
- Add support for half_kpts_first in RunVaspCustodian (right now, the option is always set to False) in custodian and also add that option for OptimizeFW with default=T
Hulk does not run calculations himself anymore (except in highly rare cases). I will ask @johnson1228 and @zbwang to comment.
I had some experience in running calculations with EDIFFG (e.g -0.03) using custodian. These calcs start from the MP double-relaxed structures. I still met many errors (~10%). I would say the error rate is much higher than that of MP double relaxation if we do it for a new crystal structure.
In my experience, the kpts scheme of MP is highly reliable (the kpts density is much high). This is likely to the reason why there is no much difference in the calculated band gap between MP and (MP+ EDIFF=1e-5;EDIFFG=-0.01) for band structure calc of an insulator. However, for magnetic systems, the EDIFFG has a significant impact on the calculated magnetization.
EDIFFG = -0.05 is a good choice, which is widely used in surface calc in order to minimize the computational cost(comparing with EDIFFG=-0.01). In the long run, I agree with SP's recommendation that MP should consider adding the force criterion given that it is of great importance for some properties calc.
The errors appearing in calculations with force criterion are handlable. The problem is that Custodian at the moment misses some error handlers for these calcs. We only need to add new error handlers into Custodian to deal with these situations. So I believe it is feasible to add force criterion in MP.
As an aside, I've had quite a few issues getting efficient force-based optimization in VASP for surfaces with adsorbates, and for a few bulk deformed structures (there's some polymorph of MoO3, for example, that ran for multiple 72-hour cycles on matgen without reaching the force convergence criteria). Based on a few things deep in the VASP manual, it seems like there are occasional issues with the FFT grid being insufficiently dense to get good forces, which manifests in the drift being significantly larger than the calculated forces.
Their proposed solution is to increase the FFT mesh, which I typically do by setting ENAUG to be 4x what ENCUT is (this is actually standard in Quantum Espresso, and is relatively cheap for surfaces). I'm not sure if it's absolutely necessary for the majority of the MP workflows, but it might help to resolve some of the failed workflows and should be able to be encoded in a custodian handler.
@zbwang could you send me a list of those materials which failed? I could test this using those and a few from my work.
@montoyjh "for example, that ran for multiple 72-hour cycles on matgen without reaching the force convergence criteria)" I do not see your output files but suspect that your calc is trapped in a local minimum. I met this problem b4. Custodian does nothing but just let it run until the ending of walltime. My calc was done quite a long time ago. I am sorry I forgot what these materials are.
@shyamd @shyuep
So I pulled Shyam's code to default ediffg to -0.05 and was going to do an atomate release, but in retrospect I think the implementation approach should be different.
In the current implementation, the MPRelaxSet stays the same (energy convergence) but atomate by default "overrides" whatever the the input set says to be force convergence (ediffg=-0.05) through custodian.
Earlier, atomate just did whatever the VaspInputSet told you unless you specified otherwise.
I prefer the earlier way, where atomate is not overriding what the input set is saying. In my experience, such overrides lead to really frustrating usage of the code (e.g., someone switches to a VaspInputSet that uses really tight force convergence like -0.001, yet atomate overrides the force convergence to -0.05.) I really don't want atomate to be overriding InputSet settings one way or the other unless the user explicitly requests this and knows that this is happening.
Can we instead do one of the following?
Option 1) Revert atomate ediffg override. Create a new VaspInputSet to reflect the updated MPRelaxSet w/force convergence (e.g., MPForceRelaxSet). Update atomate to replace MPRelaxSet with the new VaspInputSet.
Option 2) Revert atomate ediffg override. Directly update MPRelaxSet to have the new scheme (probably wouldn't suggest this)
Note - I will likely revert the atomate update of EDIFFG override fairly soon if we cannot converge on a solution.
(also, there should be a config setting in config.py for atomate where one can specify the default VaspInputSet for relaxation - thus to retain legacy behavior, you can switch the config.py to say DEFAULT_RELAX_SET = MPRelaxSet or DEFAULT_RELAX_SET = MPForceRelaxSet
What should the behavior of custodian be then?
custodian should have ediffg=None. We can keep half_kpts_first=True by default, although again I'd let a config option switch back to turning that False by default if someone wants the legacy behavior
I like the inputset dictating if it's a force relax or energy relaxation.
@shyamd do you know the status of this issue?
As far as I can tell from browsing the current code, the default structure optimization still uses MPRelaxSet, which still defaults to energy convergence. So we don't have force convergence by default for atomate structure optimizations.
The revised plan for defaulting to force convergence would be to create an MPForceRelaxSet and have atomate default to that, but this is unimplemented.
Is that correct?
There is one caveat that will need modification in Custodian.
Right now if you turn on ediffg with a force relaxation criteria and half_kpts in the atomate RunVaspCustodian you get this: 1.) Relax on energy with half-kpts 2.) Relax on forces with full-kpts.
This works well and I think we want to make sure we get this. So the code for the double relaxation job may need some tweaking to do this when the force relaxation criteria comes from the input set
I also want to share my experience with the current atomate defaults regarding structure optimization and then add my suggestion to make things a bit more strict though they may not be the most optimize solution. It is better to spend a few more CPU hours to get the structure right than a loose structure optimization that makes all the subsequent calculations garbage. This was my recent experience with 120 structures. When I looked at a few of them, the results were inconsistent with MP (mine were worse) for the same initial structures. Please let me know if I am missing something.
- currently HALF_KPOINTS_FIRST_RELAX is default to False so the double relaxation is just repeating the same job but with ISTART=1. I suggest to always turn this to True but raise the original reciprocal_density (see below)
- EDIFF_PER_ATOM in MPRelaxset is 5e-15 which seems a bit high for large systems. I don't know how well-tested this is but just to be sure I suggest to reduce it to at least 1e-5
- reciprocal_density in MPRelaxset is very low (64) specially if we are planning to set HALF_KPOINTS_FIRST_RELAX to True then relax1 would fail due to having too few kpoints. I suggest raising it to 400 (that is 4/10 of the uniform default and a number at which I get a k-point grid consistent with MP for mp-952566)
- Currently ediffg in OptimizedFW is still passed as None so the 2nd step is still energy convergence so really right now we are just repeating the same calculation twice. I tried to change it to -0.01 but it did not converge. For now as suggested by others I suggest to change it to -0.05 to turn the 2nd optimization step to a force relaxation.
I can rerun my calculations with the suggested setting and report back. If it looks good and you agree, I can make a PR so we make some changes close to the 2nd anniversary of this issue/enhancement.
I will weigh in a little. My recommendation is:
i. No need for MPForceRelax. It is pretty dumb to have a MPForceRelax that replicates MPRelax but with just the additional EDIFFG parameter. That is what the user_incar_settings is for.
ii. Initial relaxation at default MPRelax with half kpoints.
iii. Second relaxation with default MPRelax with full kpoints.
iiii. Third and subsequent relaxation with EDIFFG=-0.05 and full kpoints. You can optionally merge this with (ii). But what I would ask is that the atomate code should check the volume change between initial and final structure. If it is more than 5%, dynamically add a firework to do another relaxation.
iv. For static calculations, use EDIFF rather than EDIFF_PER_ATOM by overriding in MPStaticSet. As @albalu pointed out, EDIFF_PER_ATOM leads to huge errors for large crystals. These errors are tolerable for relaxations, but are unforgiving in the static calculation stage. The energy from the static calculation is what we use as the "true value" for all analysis. Also, the cost of getting high SCF convergence at the static stage is pretty minimal.
Whatever the decision is, document it in the text comments of the atomate workflow. Do not just rely on people to read the code to figure out what we are trying to do. We can tweak it as we monitor our experience with it.
@zbwang you agree?
- I think (i) and (ii) should be a single Firework with a custodian multi-job.
- Next, (iii) should be a subsequent Firework. This will make it easier to rerun the force convergence starting at a "good" place. The volume check should just be done via a custodian multi-job rather than by adding another Firework. I find that dynamically adding Fireworks is certainly possible but adds a lot of headaches to the system, it's usually cleaner to have it within custodian.
- For (iv), I don't have much experience. Certainly if you are trying to compare defect energies, you want the EDIFF to reflect the number of defect atoms, not the number of total atoms, which would support not dividing by the number of total atoms. I just don't much experience about how this change will actually play out in terms of convergence probability and time. What
EDIFFdo you suggest? 10^-5?
For the text comments - this can be hard to maintain. e.g., if something changes in one of the pymatgen input sets, we need to track down all the text comments in atomate. For example, the MP wiki page is supposed to have our calculation parameters as text but I'm pretty sure it's out of date since no one maintains it. I expect something similar will happen with atomate - the text comments will eventually give wrong information since no one will bother to keep them in sync (I won't do it, and I don't trust anyone else will - see aforementioned MP example). I'd be more open to the text comments just pointing to the pymatgen classes or YAML files or something than repeating that information.
I don't think (iv) will change probability of convergence much. This is after in the static calculation, where you are already starting from a well-relaxed geometry as well as CHGCAR.
As for text comments, I will leave it to you. I am not asking for much. In general, I find it helpful to write down the principles of why we structured something this way. E.g., look at the pymatgen code with significant algorithm complexity such as Ewald, Slab and StructureMatcher. The comments help understand the general program. Even if the parameters change subsequently, e..g, the text comments say step iv is EDIFF = 1e-5 but we subsequently changed it to EDIFF = 1e-4, the principle that we are using EDIFF not EDIFF_PER_ATOM and the reasons for doing so remains valid. Outdated comments is better than zero comments.
Outdated comments is better than zero comments.
Disagree - I'd prefer no information over wrong information any day. If there is no information, then someone is forced to either not know the answer or spend considerable time figuring out the right answer. Although neither of those situations is great, both those situations are better than someone simply reading a wrong comment on how things are (e.g., a wrong parameter value for EDIFF) and then reporting that value in a paper or communicating it to others even though it's not actually what was done.
look at the pymatgen code with significant algorithm complexity such as Ewald, Slab and StructureMatcher.
The comments for these are completely different, i.e., they are proper use of comments which is to provide context and intention for the code next to it. The EwaldSummation code isn't documenting what is going on or what parameters are being used 3 function calls down from the code next to it.
When the comments are describing the code next to it, there is little risk of them getting outdated. Anyone who updates the code without updating those comments is clearly just doing a bad job.
Anyway, I agree that it is very difficult for people to trace down the procedure of what is going on in the atomate workflows. There are layers upon layers - atomate involves tracking down the correct Firework that runs your calc which in turn goes to pymatgen which goes to an InputSet class, which finally may (i) subclass most of its settings from another input set class (e.g., MVLElasticSet) so you need to go further down the tree or (ii) put the settings in code that might be complex or (iii) essentially point you to a YAML file that is difficult to know where to find it if it's your first time through the code (e.g., MPHSERelaxSet). It is pretty difficult for most newcomers to get through all these levels and predict what is going to come out in terms of settings starting all the way up top at atomate. So I agree some solution is needed, I just don't think atomate itself should be documenting pymatgen parameters but rather making the atomate code clearer.
Again, when I wrote comments, I meant useful ones. Like saying we use EDIFF instead of EDIFF_PER_ATOM for <reasons>, as specified in MPStaticSet (ref link). No one asked to write we set EDIFF = 1e-5. The latter kind of comments are useless and fragile. I can read the code to know EDIFF is set at 1e-5 or 1e-4. What I want to know is the rationale for doing something. It can even be something as simple as copying this entire Github Issue chain and trimming it to a short 4-5 bullets of the logic flow and rationale.