cdk icon indicating copy to clipboard operation
cdk copied to clipboard

only return atoms within sphere=1 layer during generating HOSE code

Open biotech7 opened this issue 3 years ago • 4 comments

🐛 Bug

by getSpheres(paras ) to get all atoms according with specified sphere,but only return sphere=1 atoms without more ones.

To Reproduce

Steps to reproduce the behavior:

  ` String smiles = "CCC1=CC=CC(=C1NC(=O)CN(CC(=O)O)CC(=O)O)CC";//-----Universal SMILES
    int sphere = 6;//----sphere
    HOSECodeGenerator hcg = new HOSECodeGenerator();
    IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance();
    SmilesParser smipar = new SmilesParser(bldr);
    IAtomContainer mol = smipar.parseSmiles(smiles);

    List<IAtom>[] nodeAtoms = hcg.getSpheres(mol, mol.getAtom(9), sphere, false);
    for (int i = 0; i < sphere; i++) {
        for (IAtom iAtom : nodeAtoms[i]) {
            System.out.println(iAtom.getSymbol());
        }
        System.out.println("-------");
    }`

Environment

  • OS: win10
  • IDE: IDEA 2021
  • JDK:1.8
  • CDK version:2.8

test

biotech7 avatar Jan 26 '23 06:01 biotech7

Thanks please note hose codes are an outdated method and something like Signatures/CircularFingerprint is much better.

johnmay avatar Jan 26 '23 09:01 johnmay

John, thanks for your valuable advice. It is mostly like that CircularFingerprinter could not be "reversed" to corresponding molecule structure as I wanted. Signatures could satisfy my requests. But how to convert atom index into vertexIndex in the job of calling this method signatureStringForVertex(vertexIndex, height) , I could not figure out a way for this conversion from CDK JavaDoc. Signatures usually gets rid of aromaticity/chirality infomation when "cutting" a fragment from a full molecule. Is there a straightforward strategy to recover all the lost info from signatures? All info play key role in AI modelling.

biotech7 avatar Jan 27 '23 06:01 biotech7

You should be able to get the atom info from the circular fingerprint… but I need to check.

Anyways the main point was you probably don’t want hose codes :)

johnmay avatar Jan 27 '23 08:01 johnmay

key properties such as aromaticity, charge,stereotypical type etc. as well as height/sphere stored in a substructure based on CircularFP are highly preferred.

biotech7 avatar Jan 31 '23 09:01 biotech7