DiffDock icon indicating copy to clipboard operation
DiffDock copied to clipboard

The "skip" issue encountered after running the inference

Open AIM132 opened this issue 1 year ago • 7 comments

python -m inference --config default_inference_args.yaml --protein_path ./rec_1.pdb --ligand ./ZINC01535869.mol2 --out_dir ./outache/ --inference_steps 20 --samples_per_complex 40 --batch_size 10 --actual_steps 18 --no_final_step_noise

The following error occurred and there was no output. Does anyone know how to solve it?thanks

Skipping complex_0 because of the error: 'X' HAPPENING | The confidence dataset did not contain ['complex_0']. We are skipping this complex. 1it [00:00, 1.13it/s] Failed for 0 complexes Skipped 1 complexes

AIM132 avatar Apr 24 '24 04:04 AIM132

I had the same issue I directly ran the inference file, but somehow it worked when I used app/main.py

zeqri avatar Apr 24 '24 18:04 zeqri

thanks,i try it

AIM132 avatar Apr 25 '24 02:04 AIM132

I had similar issue using DiffDock v1.1.1.

However it works using the data examples provided in https://github.com/gcorso/DiffDock/tree/main/data/1a0q:

python -m inference --config diffdock_default_inference_args.yaml --protein_path 1a0q_protein_processed.pdb --ligand_description 1a0q_ligand.sdf --out_dir test1-1a0q

But it fails with my data:

python -m inference --config diffdock_default_inference_args.yaml --protein_path AF-P00519-F1-model_v4.pdb --ligand_description DB00619.sdf --out_dir AF-P00519-F1-model_v4/DB00619

This raises the following error:

Processing 1 of 1 batches (1 sequences)
0it [00:00, ?it/s]/app/diffdock/datasets/parse_chi.py:91: RuntimeWarning: invalid value encountered in cast
  Y = indices.astype(int)
[2024-Apr-27 13:01:30 CEST] WARNING -The test dataset did not contain complex_0 for DB00619.sdf and AF-P00519-F1-model_v4.pdb. We are skipping this complex.
1it [00:00,  1.04it/s]
[2024-Apr-27 13:01:30 CEST] WARNING -
    Failed for 0 / 1 complexes.
    Skipped 1 / 1 complexes.

Skipping complex_0 because of the error:
Sizes of tensors must match except in dimension 1. Expected size 1130 but got size 1022 for tensor number 1 in the list.

These two files can be used to test after removing the .txt extension

Is there anything wrong in my input data? How can I solve this issue?

phupe avatar Apr 27 '24 11:04 phupe

Further test:

It fails with AF-P00519-F1-model_v4.pdb and 1a0q_ligand.sdf:

python -m inference --config diffdock_default_inference_args.yaml --protein_path AF-P00519-F1-model_v4.pdb --ligand_description 1a0q_ligand.sdf --out_dir AF-P00519-F1-model_v4/DB00619

It raises this error:

Generating ESM language model embeddings
Processing 1 of 1 batches (1 sequences)
0it [00:00, ?it/s]/app/diffdock/datasets/parse_chi.py:91: RuntimeWarning: invalid value encountered in cast
  Y = indices.astype(int)
[2024-Apr-27 13:08:53 CEST] WARNING -The test dataset did not contain complex_0 for 1a0q_ligand.sdf and AF-P00519-F1-model_v4.pdb. We are skipping this complex.
1it [00:00,  1.10it/s]
[2024-Apr-27 13:08:53 CEST] WARNING -
    Failed for 0 / 1 complexes.
    Skipped 1 / 1 complexes.

Skipping complex_0 because of the error:
Sizes of tensors must match except in dimension 1. Expected size 1130 but got size 1022 for tensor number 1 in the list.

But it works with 1a0q_protein_processed.pdb and DB00619.sdf:

python -m inference --config diffdock_default_inference_args.yaml --protein_path 1a0q_protein_processed.pdb --ligand_description DB00619.sdf --out_dir 1a0q_protein/DB00619

This means the error comes from the PDB file.

Note that I also tried to keep in the PDB files only the lines starting with ATOM:

AF-P00519-F1-model_v4-truncated.pdb.txt

But there is still the same error.

The PDB file comes fom the AlphaFold website

  • https://alphafold.ebi.ac.uk/entry/P00519
  • https://alphafold.ebi.ac.uk/files/AF-P00519-F1-model_v4.pdb

phupe avatar Apr 27 '24 12:04 phupe

Last test that was successul with this PDB directly directly produced by AlphaFold I ran myself

python -m inference --config diffdock_default_inference_args.yaml --protein_path ranked_0.pdb --ligand_description DB00619.sdf --out_dir ranked_0/DB00619

ranked_0.pdb.txt

phupe avatar Apr 27 '24 13:04 phupe

This is possibly because of the protein length restriction (1022) on DiffDock inferences. Refer to this issue #199

prathithbhargav avatar Apr 27 '24 20:04 prathithbhargav

Thank you @prathithbhargav for this information, I was not aware of this limitation.

phupe avatar Apr 28 '24 20:04 phupe