deeprank icon indicating copy to clipboard operation
deeprank copied to clipboard

data preprocessing

Open CunliangGeng opened this issue 6 years ago • 1 comments

It's necessary to preprocess the PDB and PSSM files before using them in DeepRank, here are some requirements:

PDB

  • the reference pdb name must be caseID.pdb, caseID is e.g. 7CEI that does not contain _.
  • the decoy pdb name must be caseID_*.pdb, e.g. 7CEI_1w.pdb.
  • the chainIDs in pdb file must be A and B, and the order matters which means the chain A must exist in front of the chain B in the pdb file.
  • the index or rowID must start from 1 and be continuous for ATOM lines, which means you have to remove such as TER and HETATM lines.

PSSM (new format)

  • for a given pdb caseID_*.pdb, its PSSM filenames must be caseID_*.A.pdb.pssm and caseID_*.B.pdb.pssm.
  • it is OK to use reference PDB's PSSM files for decoy pdbs, which are caseID.A.pdb.pssm and caseID.B.pdb.pssm.

CunliangGeng avatar Oct 04 '19 13:10 CunliangGeng

@LilySnow Should update the doc with these information.

CunliangGeng avatar Feb 18 '20 16:02 CunliangGeng