DiffDock icon indicating copy to clipboard operation
DiffDock copied to clipboard

Anyway to dock many ligands with the same protein instead of loading the same protein everytime?

Open sky1ove opened this issue 1 year ago • 3 comments

I'm testing a dataset of ligands against the same protein. Instead of loading the same protein pdb file everytime, is there anyway to load the pdb once, and just start docking new ligand?

sky1ove avatar Apr 10 '24 16:04 sky1ove

Anything short of modifying the code a bit, won't work.

One way would be to "memoize" the function you suspect is the slowest using joblib memory cache. Basically, find the function that causes you most delay, and annotate it with that @cache decorator. AFAIK this is the simplest way to do what you want.

Let me know if this helps.

tornikeo avatar Apr 25 '24 13:04 tornikeo

What I do in my local is memorizing esm embeddings. Since the most expensive computation during target preprocessing is esm, so you can reduce preprocessing time much smaller. Anyway, it can be done by modifiying the code like

unique_sequences = compute_unique(list_of_protein_input)

labels, sequences = [], []
for protein_info, sequence in unique_sequences.items():
    s = sequence.split(":")
    sequences.extend(s)
    labels.extend([(*protein_info, j) for j in range(len(s))])

lm_embeddings = compute_ESM_embeddings(model, alphabet, labels, sequences)
unique_lm_embeddings = {}
for protein_info, sequence in unique_sequences.items():
     s = sequence.split(":")
     unique_lm_embeddings[protein_info] = [
         lm_embeddings[(*protein_info, j)] for j in range(len(s))
     ]

lm_embeddings = [
    unique_lm_embeddings[protein_info]
    for protein_info in list_of_protein_input
]

For caution, my code is from checkout v1.0 code

demian3b avatar May 03 '24 01:05 demian3b