DeePTB Train：ham_to_feature.py", line 95, in block_to

Describe the bug

Hi developers, I try to use E3TB to train on my dataset. I meet the follow error

DEEPTB INFO    ------------------------------------------------------------------
DEEPTB INFO         Cutoff options:
DEEPTB INFO
DEEPTB INFO         r_max            : {'Nb': 8.0, 'O': 7.0, 'Cl': 7.0}
DEEPTB INFO         er_max           : None
DEEPTB INFO         oer_max          : None
DEEPTB INFO    ------------------------------------------------------------------
DEEPTB INFO    A public `info.json` file is provided, and will be used by the subfolders who do not have their own `info.json` file.
Processing dataset...
^MLoading data:   0%|          | 0/1 [00:00<?, ?it/s]/miniconda3/envs/dptb/lib/python3.10/site-packages/dptb/data/AtomicData.py:963: 
UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. 
This means writing to this tensor will result in undefined behavior. 
You may want to copy the array to protect its data or make it writable before converting it to a tensor. 
This type of warning will be suppressed for the rest of this program. 
(Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:199.)
  cell_tensor = torch.as_tensor(temp_cell, device=out_device, dtype=out_dtype)
^MLoading data:   0%|          | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/miniconda3/envs/dptb/bin/dptb", line 8, in <module>
    sys.exit(main())
............................(There are still a few lines that have not been copied)................................
  File "/miniconda3/envs/dptb/lib/python3.10/site-packages/dptb/utils/torch_geometric/dataset.py", line 175, in _process
    self.process()
  File "/miniconda3/envs/dptb/lib/python3.10/site-packages/dptb/data/dataset/_base_datasets.py", line 209, in process
    data = self.get_data() ## get data returns either a list of AtomicData class or a data dict
  File "/miniconda3/envs/dptb/lib/python3.10/site-packages/dptb/data/dataset/_default_dataset.py", line 384, in get_data
    subdata_list = subdata.toAtomicDataList(self.transform)
  File "/miniconda3/envs/dptb/lib/python3.10/site-packages/dptb/data/dataset/_default_dataset.py", line 294, in toAtomicDataList
    block_to_feature(atomic_data, idp, features, overlaps)
  File "/miniconda3/envs/dptb/lib/python3.10/site-packages/dptb/data/interfaces/ham_to_feature.py", line 95, in block_to_feature
    onsite_out[feature_slice] = block_ij.flatten()
ValueError: could not broadcast input array from shape (4,) into shape (5,)

Expected behavior

Can you help me with this? I don't know where to start. Looking forward to your reply.

To Reproduce

parse.zip

I used dftio to parse the dataset. Whether there is a problem in this step.

Environment

No response

Additional Context

No response

Feb 24 '25 03:02 dongsh0320

Hi, from my current observation, this is most likely due to a mismatch in model basis file and the true basis of your DFT computation.

In your parsed dataset, the basis is:{'Nb': '4s2p2d1f', 'Cl': '2s2p1d', 'O': '2s2p1d'} While in your input file, it is: "basis": { "Nb": "4s2p2d1f", "O": "3s2p2d", "Cl": "3s2p2d" }, You can clearly see the mismatch. Please align the input file with your DFT basis and try it again.

Also, I see you have set the validation loss function, without the validation dataset, this will also trigger some error. Please remove the validation loss function setting or add a validation dataset to proceed.

Zhanghao

Feb 27 '25 15:02 floatingCatty

Thank you for your reply and reminder.

I have modified the orbital data to make it match (Thank you for your careful check. I have read it several times but have not found such a detailed error) and deleted the setting of validation dataset. I just started trying out the E3 model and am still exploring it. However, there are relatively few tutorials available online, and I'm not very familiar with the parameter settings yet.

By the way, does the data preparation for the SK model have to start from DFTB? Sorry, I also have limited knowledge of ASE. I noticed in the examples that MD data is required. Could you clarify which software's MD output is supported? Does the output data need to be preprocessed? In the tutorial, it mentions that files like atomic_numbers.dat, pbc.dat, cell.dat, and positions.dat can be provided instead of .traj format files. Does this mean that the data obtained from dftio for the E3 model can also be used for the SK model?

Thank you once again.

Feb 28 '25 13:02 dongsh0320

Hi, thanks.

For the document about parameter settings of E3 mode, please see:

https://deeptb.readthedocs.io/en/latest/quick_start/input.html#
https://deeptb.readthedocs.io/en/latest/advanced/e3tb/advanced_input.html Here is an example: https://deeptb.readthedocs.io/en/latest/quick_start/hands_on/e3tb_hands_on.html Indeed, the document does not cover all details of the E3 mode's parameters; we are working to amend this and are very happy to provide support.

The data preparation of the SK mode does not have to start with DFTB. You can train from scratch. From our experience, training from scratch can work well as long as the band structure is not heavily degenerated or has a lot of crossing. In other cases, starting from DFTB would be a better choice.

The MD trajectory is stored as ASE's trajectory format; it is not dependent on the MD software, and almost all software's trajectory can be read and transformed by ASE. Please see: https://wiki.fysik.dtu.dk/ase/ase/io/trajectory.html

Yes, the data obtained from "E3", or more accurately from dftio, can be used in SK mode. The dftio serve as our data processing tools, so either cases you are safe the process your data via dftio.

Please feel free to contact me if anything is needed.

Zhanghao

Feb 28 '25 14:02 floatingCatty

Thank you for patiently forwarding these links; I have learned them all.

I have a new question regarding the SK model: if I want to train a system with three types of atoms (which will involve six different bonds), how should the cutoff radius r_max be set? I try to use a number that cover all first bonds.

I am trying to train the SK model using data parsed by dftio . Following the example of silicon, I modified my input file but encountered the following error. Could you help me check what might be causing this issue?

..........................
  File "/dptb/lib/python3.10/site-packages/dptb/nnops/trainer.py", line 194, in epoch
    self.iteration(ibatch)
  File "/dptb/lib/python3.10/site-packages/dptb/nnops/trainer.py", line 107, in iteration
    loss = self.train_lossfunc(batch, batch_for_loss)
  File "/dptb/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/dptb/lib/python3.10/site-packages/dptb/nnops/loss.py", line 198, in forward
    eig_pred_cut = eig_pred_cut - eig_pred_cut.reshape(-1).min()
RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. 
Specify the reduction dim with the 'dim' argument.

Reproduce: dptb train ./input.json -o ./outputtest

sk_test.zip

Thanks you for your patience and time. Looking forward to your reply.

Mar 05 '25 11:03 dongsh0320

Thank you for patiently forwarding these links; I have learned them all.

I have a new question regarding the SK model: if I want to train a system with three types of atoms (which will involve six different bonds), how should the cutoff radius r_max be set? I try to use a number that cover all first bonds.

I am trying to train the SK model using data parsed by dftio . Following the example of silicon, I modified my input file but encountered the following error. Could you help me check what might be causing this issue?
..........................
  File "/dptb/lib/python3.10/site-packages/dptb/nnops/trainer.py", line 194, in epoch
    self.iteration(ibatch)
  File "/dptb/lib/python3.10/site-packages/dptb/nnops/trainer.py", line 107, in iteration
    loss = self.train_lossfunc(batch, batch_for_loss)
  File "/dptb/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/dptb/lib/python3.10/site-packages/dptb/nnops/loss.py", line 198, in forward
    eig_pred_cut = eig_pred_cut - eig_pred_cut.reshape(-1).min()
RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. 
Specify the reduction dim with the 'dim' argument.
Reproduce: dptb train ./input.json -o ./outputtest

sk_test.zip

Thanks you for your patience and time. Looking forward to your reply.

Hi, there is a simple but very convenient function in dptb to check the bond distribution of your structure.

usage: dptb bond [-h] [-ll {DEBUG,3,INFO,2,WARNING,1,ERROR,0}] [-lp LOG_PATH] [-acc ACCURACY] [-c CUTOFF] struct

positional arguments:
  struct                the structure input, must be ase readable structure format

optional arguments:
  -h, --help            show this help message and exit
  -ll {DEBUG,3,INFO,2,WARNING,1,ERROR,0}, --log-level {DEBUG,3,INFO,2,WARNING,1,ERROR,0}
                        set verbosity level by string or number, 0=ERROR, 1=WARNING, 2=INFO and 3=DEBUG (default: INFO)
  -lp LOG_PATH, --log-path LOG_PATH
                        set log file to log messages to disk, if not specified, the logs will only be output to console (default: None)
  -acc ACCURACY, --accuracy ACCURACY
                        The accuracy to judge whether two bond are the same. (default: 0.001)
  -c CUTOFF, --cutoff CUTOFF
                        The cutoff radius of bond search. (default: 6.0)

For example, for your structure, the result will look like:

dptb bond ./test.vasp -acc 0.01 -c 5.0
 Bond Type         1         2         3         4         5         6         7         8         9
------------------------------------------------------------------------------------------------------------------------
     Nb-Cl      2.46      2.54      4.52      4.54      4.81      4.88
     Nb-Nb      2.98      3.79      3.97      4.96
      Nb-O      1.83      2.14      3.53      3.71      4.16      4.31
      Cl-O      3.16      3.18      3.19      3.22      4.88
     Cl-Cl      3.37      3.39      3.91      3.94      3.97      4.08      4.12      4.35      4.97
       O-O      3.08      3.69      3.97

The r_max is a hard cutoff that cut the bond outside the value. For example, If you start with the first neighbour, the r_max should be larger than 1.83 but smaller than 2.46.

Hope this help.

Mar 07 '25 03:03 floatingCatty

Meanwhile, since r_max can be a dict, you can choose a supported cutoff for each atom. Just keep in mind that (cutoff of A + cutoff of B) / 2 = bond A-B's maximum length.

The error is caused by the band window. band_min should be larger or equals to 0. Please read our doc for the setting of info.json in sk mode.

https://deeptb.readthedocs.io/en/latest/quick_start/input.html#data-settings-info-json

Mar 07 '25 03:03 floatingCatty

Hi, after some attempts, I have successfully run the skmodel. However, I noticed that the training loss initially decreased but then increased, reaching a minimum around 10^-3 before eventually stabilizing at approximately 0.01. Is this level of accuracy sufficient for practical applications? Additionally, when I tried to plot the band, I still encountered the "not positive-definite" issue. Could you provide some guidance on these problems?

Mar 14 '25 08:03 dongsh0320

Train：ham_to_feature.py", line 95, in block_to_feature

Describe the bug

Expected behavior

To Reproduce

Environment

Additional Context