Train:ham_to_feature.py", line 95, in block_to_feature
Describe the bug
Hi developers, I try to use E3TB to train on my dataset. I meet the follow error
DEEPTB INFO ------------------------------------------------------------------
DEEPTB INFO Cutoff options:
DEEPTB INFO
DEEPTB INFO r_max : {'Nb': 8.0, 'O': 7.0, 'Cl': 7.0}
DEEPTB INFO er_max : None
DEEPTB INFO oer_max : None
DEEPTB INFO ------------------------------------------------------------------
DEEPTB INFO A public `info.json` file is provided, and will be used by the subfolders who do not have their own `info.json` file.
Processing dataset...
^MLoading data: 0%| | 0/1 [00:00<?, ?it/s]/miniconda3/envs/dptb/lib/python3.10/site-packages/dptb/data/AtomicData.py:963:
UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors.
This means writing to this tensor will result in undefined behavior.
You may want to copy the array to protect its data or make it writable before converting it to a tensor.
This type of warning will be suppressed for the rest of this program.
(Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:199.)
cell_tensor = torch.as_tensor(temp_cell, device=out_device, dtype=out_dtype)
^MLoading data: 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/miniconda3/envs/dptb/bin/dptb", line 8, in <module>
sys.exit(main())
............................(There are still a few lines that have not been copied)................................
File "/miniconda3/envs/dptb/lib/python3.10/site-packages/dptb/utils/torch_geometric/dataset.py", line 175, in _process
self.process()
File "/miniconda3/envs/dptb/lib/python3.10/site-packages/dptb/data/dataset/_base_datasets.py", line 209, in process
data = self.get_data() ## get data returns either a list of AtomicData class or a data dict
File "/miniconda3/envs/dptb/lib/python3.10/site-packages/dptb/data/dataset/_default_dataset.py", line 384, in get_data
subdata_list = subdata.toAtomicDataList(self.transform)
File "/miniconda3/envs/dptb/lib/python3.10/site-packages/dptb/data/dataset/_default_dataset.py", line 294, in toAtomicDataList
block_to_feature(atomic_data, idp, features, overlaps)
File "/miniconda3/envs/dptb/lib/python3.10/site-packages/dptb/data/interfaces/ham_to_feature.py", line 95, in block_to_feature
onsite_out[feature_slice] = block_ij.flatten()
ValueError: could not broadcast input array from shape (4,) into shape (5,)
Expected behavior
Can you help me with this? I don't know where to start. Looking forward to your reply.
To Reproduce
I used dftio to parse the dataset. Whether there is a problem in this step.
Environment
No response
Additional Context
No response
Hi, from my current observation, this is most likely due to a mismatch in model basis file and the true basis of your DFT computation.
In your parsed dataset, the basis is:{'Nb': '4s2p2d1f', 'Cl': '2s2p1d', 'O': '2s2p1d'} While in your input file, it is: "basis": { "Nb": "4s2p2d1f", "O": "3s2p2d", "Cl": "3s2p2d" }, You can clearly see the mismatch. Please align the input file with your DFT basis and try it again.
Also, I see you have set the validation loss function, without the validation dataset, this will also trigger some error. Please remove the validation loss function setting or add a validation dataset to proceed.
Zhanghao
Thank you for your reply and reminder.
I have modified the orbital data to make it match (Thank you for your careful check. I have read it several times but have not found such a detailed error) and deleted the setting of validation dataset. I just started trying out the E3 model and am still exploring it. However, there are relatively few tutorials available online, and I'm not very familiar with the parameter settings yet.
By the way, does the data preparation for the SK model have to start from DFTB? Sorry, I also have limited knowledge of ASE. I noticed in the examples that MD data is required. Could you clarify which software's MD output is supported? Does the output data need to be preprocessed? In the tutorial, it mentions that files like atomic_numbers.dat, pbc.dat, cell.dat, and positions.dat can be provided instead of .traj format files. Does this mean that the data obtained from dftio for the E3 model can also be used for the SK model?
Thank you once again.
Hi, thanks.
For the document about parameter settings of E3 mode, please see:
- https://deeptb.readthedocs.io/en/latest/quick_start/input.html#
- https://deeptb.readthedocs.io/en/latest/advanced/e3tb/advanced_input.html Here is an example: https://deeptb.readthedocs.io/en/latest/quick_start/hands_on/e3tb_hands_on.html Indeed, the document does not cover all details of the E3 mode's parameters; we are working to amend this and are very happy to provide support.
The data preparation of the SK mode does not have to start with DFTB. You can train from scratch. From our experience, training from scratch can work well as long as the band structure is not heavily degenerated or has a lot of crossing. In other cases, starting from DFTB would be a better choice.
The MD trajectory is stored as ASE's trajectory format; it is not dependent on the MD software, and almost all software's trajectory can be read and transformed by ASE. Please see: https://wiki.fysik.dtu.dk/ase/ase/io/trajectory.html
Yes, the data obtained from "E3", or more accurately from dftio, can be used in SK mode. The dftio serve as our data processing tools, so either cases you are safe the process your data via dftio.
Please feel free to contact me if anything is needed.
Zhanghao
Thank you for patiently forwarding these links; I have learned them all.
I have a new question regarding the SK model: if I want to train a system with three types of atoms (which will involve six different bonds), how should the cutoff radius r_max be set? I try to use a number that cover all first bonds.
I am trying to train the SK model using data parsed by dftio . Following the example of silicon, I modified my input file but encountered the following error. Could you help me check what might be causing this issue?
..........................
File "/dptb/lib/python3.10/site-packages/dptb/nnops/trainer.py", line 194, in epoch
self.iteration(ibatch)
File "/dptb/lib/python3.10/site-packages/dptb/nnops/trainer.py", line 107, in iteration
loss = self.train_lossfunc(batch, batch_for_loss)
File "/dptb/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/dptb/lib/python3.10/site-packages/dptb/nnops/loss.py", line 198, in forward
eig_pred_cut = eig_pred_cut - eig_pred_cut.reshape(-1).min()
RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0.
Specify the reduction dim with the 'dim' argument.
Reproduce: dptb train ./input.json -o ./outputtest
Thanks you for your patience and time. Looking forward to your reply.
Thank you for patiently forwarding these links; I have learned them all.
I have a new question regarding the SK model: if I want to train a system with three types of atoms (which will involve six different bonds), how should the cutoff radius
r_maxbe set? I try to use a number that cover all first bonds.I am trying to train the SK model using data parsed by dftio . Following the example of silicon, I modified my input file but encountered the following error. Could you help me check what might be causing this issue?
.......................... File "/dptb/lib/python3.10/site-packages/dptb/nnops/trainer.py", line 194, in epoch self.iteration(ibatch) File "/dptb/lib/python3.10/site-packages/dptb/nnops/trainer.py", line 107, in iteration loss = self.train_lossfunc(batch, batch_for_loss) File "/dptb/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "/dptb/lib/python3.10/site-packages/dptb/nnops/loss.py", line 198, in forward eig_pred_cut = eig_pred_cut - eig_pred_cut.reshape(-1).min() RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.Reproduce: dptb train ./input.json -o ./outputtest
Thanks you for your patience and time. Looking forward to your reply.
Hi, there is a simple but very convenient function in dptb to check the bond distribution of your structure.
usage: dptb bond [-h] [-ll {DEBUG,3,INFO,2,WARNING,1,ERROR,0}] [-lp LOG_PATH] [-acc ACCURACY] [-c CUTOFF] struct
positional arguments:
struct the structure input, must be ase readable structure format
optional arguments:
-h, --help show this help message and exit
-ll {DEBUG,3,INFO,2,WARNING,1,ERROR,0}, --log-level {DEBUG,3,INFO,2,WARNING,1,ERROR,0}
set verbosity level by string or number, 0=ERROR, 1=WARNING, 2=INFO and 3=DEBUG (default: INFO)
-lp LOG_PATH, --log-path LOG_PATH
set log file to log messages to disk, if not specified, the logs will only be output to console (default: None)
-acc ACCURACY, --accuracy ACCURACY
The accuracy to judge whether two bond are the same. (default: 0.001)
-c CUTOFF, --cutoff CUTOFF
The cutoff radius of bond search. (default: 6.0)
For example, for your structure, the result will look like:
dptb bond ./test.vasp -acc 0.01 -c 5.0
Bond Type 1 2 3 4 5 6 7 8 9
------------------------------------------------------------------------------------------------------------------------
Nb-Cl 2.46 2.54 4.52 4.54 4.81 4.88
Nb-Nb 2.98 3.79 3.97 4.96
Nb-O 1.83 2.14 3.53 3.71 4.16 4.31
Cl-O 3.16 3.18 3.19 3.22 4.88
Cl-Cl 3.37 3.39 3.91 3.94 3.97 4.08 4.12 4.35 4.97
O-O 3.08 3.69 3.97
The r_max is a hard cutoff that cut the bond outside the value. For example, If you start with the first neighbour, the r_max should be larger than 1.83 but smaller than 2.46.
Hope this help.
Meanwhile, since r_max can be a dict, you can choose a supported cutoff for each atom. Just keep in mind that (cutoff of A + cutoff of B) / 2 = bond A-B's maximum length.
The error is caused by the band window. band_min should be larger or equals to 0. Please read our doc for the setting of info.json in sk mode.
https://deeptb.readthedocs.io/en/latest/quick_start/input.html#data-settings-info-json
Hi, after some attempts, I have successfully run the skmodel. However, I noticed that the training loss initially decreased but then increased, reaching a minimum around 10^-3 before eventually stabilizing at approximately 0.01. Is this level of accuracy sufficient for practical applications? Additionally, when I tried to plot the band, I still encountered the "not positive-definite" issue. Could you provide some guidance on these problems?