How to integrate custom atom features?
Hi, may I ask if I can add the atom positions in atom_feature_utils.py? If I include the coordinates, does this mean I should directly sum the features in the node feature vector from the GNN graph? I apologize for the confusion, but I am not sure where the GNN model is located or where the features get embedded. If you could help me locate where the GNN model or the feature embedding process is, I would humbly thank you
If you add an additional AtomFeatureExtractor in atom_feature_utils.py, and append it to the list of default featurizers, then that could work, but these featurizers only work on individual atoms, whereas your 3D position featurizer would need to take a look at the entire molecule to determine the conformation first.
Maybe a better approach would be to take a look at featurise_atoms in molecule_dataset_utils.py. As you can see, this function builds a NodeFeatures object by gathering features from the featurizers. At this point, the mol is in scope, so you could plug in your 3D position determination there, and then append the resulting positions to the all_atom_features list. Note that you may want to normalize (at least center) those positions before providing them, as the model would treat them like any other atom features, without any rototranslational invariance/equivariance.
I am not 100% sure everything will work out-of-the-box, but I did check that featurise_atoms is also called during decoding (see moler_decoding_utils.py), so plugging your change in that one place might just be sufficient.
Sir, Thank you so much.
Sorry to say that but what method you are use to embedded the coordinate into mol? I have complicate error with Converting graph sample for Cc1ccc(B(O)O)cc1Br failed - aborting!Converting graph sample for O=C(O)C(=O)Nc1sc2c(c1C(=O)O)CC[NH2+]C2 failed - aborting!
sorry to show my code which not very good structure format. Thank you to always help.
- convert 2D or 3D SDF to mol rdkit and use smile to calculate the sa_score, clogp and others
- extract the coordinate and emmbed to NodeFeature below here is my code
def featurise_atoms(
mol: Mol,
atom_feature_extractors: List[AtomFeatureExtractor],
motif_vocabulary: Optional[MotifVocabulary] = None,
motifs: List[MotifAnnotation] = [],
) -> NodeFeatures:
if motif_vocabulary is not None:
atom_type_feature_extractor = next(
featuriser
for featuriser in atom_feature_extractors
if isinstance(featuriser, AtomTypeFeatureExtractor)
)
enclosing_motif_id: Dict[int, int] = {}
for motif in motifs:
motif_id = motif_vocabulary.vocabulary[motif.motif_type]
for atom in motif.atoms:
enclosing_motif_id[atom.atom_id] = motif_id
num_motifs = len(motif_vocabulary.vocabulary)
all_atom_class_ids = []
num_atom_classes = atom_type_feature_extractor.feature_width + num_motifs
else:
assert not motifs
all_atom_class_ids = None
num_atom_classes = None
all_atom_features = []
for atom_id, atom in enumerate(mol.GetAtoms()):
atom_symbol = get_atom_symbol(atom)
atom_features = [
atom_featuriser.featurise(atom) for atom_featuriser in atom_feature_extractors
]
if motif_vocabulary is not None:
motif_or_atom_id = enclosing_motif_id.get(
atom_id, atom_type_feature_extractor.type_name_to_index(atom_symbol) + num_motifs
)
assert motif_or_atom_id < num_atom_classes
all_atom_class_ids.append(motif_or_atom_id)
# Ensure the molecule has a conformer
if mol.GetNumConformers() == 0:
AllChem.EmbedMolecule(mol) # Generate a conformer if none exists
# Extract coordinate
ligand_conformer = mol.GetConformer()
atom_coordinates = np.array([
[ligand_conformer.GetAtomPosition(atom_id)[0], ligand_conformer.GetAtomPosition(atom_id)[1],
ligand_conformer.GetAtomPosition(atom_id)[2]]], dtype=np.float32)
atom_distances = np.sqrt(np.sum(np.square(atom_coordinates.reshape(-1, 1, 3) - atom_coordinates.reshape(1, -1, 3)), axis=-1))
interaction = np.where(atom_distances < 4.0, 1., 0.)
mol_interactions = [interaction.reshape(-1)]
atom_features = np.concatenate(atom_features).astype(np.float32)
mol_interactions = np.concatenate(mol_interactions).astype(np.float32)
all_atom_features.append(atom_features)
all_atom_features.append(mol_interactions)
return NodeFeatures(
real_valued_features=all_atom_features,
categorical_features=all_atom_class_ids,
num_categorical_classes=num_atom_classes,
)
The all_atom_features list is supposed to hold one feature vector per atom, yet in your changed code you append to it twice when iterating over the atoms. If you want to add extra atom features, you would need to add them to the atom_features list (before the np.concatenate(atom_features).astype(np.float32) is called), and keep the single append to all_atom_features.
Note that this function deals with features for atoms, not atom pairs or bonds. An example atom feature would be atom_coordinates computed in your code. The atom_distances and interaction matrices you construct are for pairs of atoms, which doesn't really fit as atom features. You could start by using atom_coordinates (only the part corresponding to the particular atom that is being considered in the loop), and perhaps also some atom-level statistics of the interaction matrix (e.g. number of other atoms closer than a given distance), and see if then you can get the code to run.