Finetuning RFdiffusion for antibody de novo design
Hi, I refer to the paper " Atomically accurate de novo design of single-domain antibodies". There is a section "Fine-tuning RFdiffusion for antibody design":
"RFdiffusion uses the AlphaFold214/RF2 frame representation of protein backbones comprising the Cɑ coordinate and N-Cɑ-C rigid orientation for each residue. During training, a noising schedule is used that, over a set number of “timesteps” (T), corrupts the protein frames to distributions indistinguishable from random distributions (Cɑ coordinates are corrupted with 3D Gaussian noise, and residue orientations with Brownian motion on SO3). During training, a PDB structure and a random timestep (t) are sampled, and t noising steps are applied to the structure. RFdiffusion predicts the de-noised (pX0) structure at each timestep, and a mean squared error (m.s.e.) loss is minimized between the true structure (X0) and the prediction. At inference time, translations are sampled from the 3D Gaussian and uniform rotational distributions (XT) and RFdiffusion iteratively de-noises these frames to generate a new protein structure. To explore the design of antibodies, we fine-tuned RFdiffusion predominantly on antibody complex structures (Fig. 1; Methods). At each step of training, an antibody complex structure is sampled, along with a random timestep (t), and this number of noise steps are added to corrupt the antibody structure (but not the target structure). To permit specification of the framework structure and sequence at inference time, the framework sequence and structure are provided to RFdiffusion during training (Fig. 1B). Because it is desirable for the rigid body position (dock) between antibody and target to be designed by RFdiffusion along with the CDR loop conformations, the framework structure is provided in a global-frame-invariant manner during training (Fig. 1C). We utilize the “template track” of RF/RFdiffusion to provide the framework structure as a 2D matrix of pairwise distances and dihedral angles between each pair of residues (a representation from which 3D structures can be accurately recapitulated)15, (Extended Data Fig. 1A). The framework and target templates specify the internal structure of each protein chain, but not their relative positions in 3D space (in this work we keep the sequence and structure of the framework region fixed, and focus on the design solely of the CDRs and the overall rigid body placement of the antibody against the target). In vanilla RFdiffusion, de novo binders can be targeted to specific epitopes at inference time through training with an additional one-hot encoded “hotspot” feature, which provides some fraction of the residues the designed binder should interact with. For antibody design, where we seek CDR-loop-mediated interactions, we adapt this feature to specify residues on the target protein with which CDR loops interact (Fig. 1D). "
Is there an example of the RFdiffusion commands for this finetuning? In particular, can we see an example of the framework structure that is a 2D matrix of pairwise distances and dihedral angles between each pair of residues?
There is a separate RFAntibody repo that has more details on how the framework structure is inputted as a 2D matrix of pairwise distances and dihedral angles: https://github.com/RosettaCommons/RFantibody.