deeprank
deeprank copied to clipboard
data preprocessing
It's necessary to preprocess the PDB and PSSM files before using them in DeepRank, here are some requirements:
PDB
- the reference pdb name must be
caseID.pdb, caseID is e.g. 7CEI that does not contain_. - the decoy pdb name must be
caseID_*.pdb, e.g. 7CEI_1w.pdb. - the chainIDs in pdb file must be A and B, and the order matters which means the chain A must exist in front of the chain B in the pdb file.
- the index or rowID must start from 1 and be continuous for
ATOMlines, which means you have to remove such asTERandHETATMlines.
PSSM (new format)
- for a given pdb
caseID_*.pdb, its PSSM filenames must becaseID_*.A.pdb.pssmandcaseID_*.B.pdb.pssm. - it is OK to use reference PDB's PSSM files for decoy pdbs, which are
caseID.A.pdb.pssmandcaseID.B.pdb.pssm.
@LilySnow Should update the doc with these information.