Reinvent icon indicating copy to clipboard operation
Reinvent copied to clipboard

Tautomers

Open Sagar-Gore opened this issue 4 years ago • 3 comments

I am still using REINVENT 2.0 (have not updated to 3.0 yet) and have observed that incorrect (less abundant at specific pH) tautomers are generated. I have used QED as scoring function component in RL mode. As these are still valid SMILES molecules, incorrect tautomers (not penalised) are present even in the last batches of generated molecules. Examples of molecules with incorrect tautomers: c1ccccc1-c1[nH]c(=N)nc(OC)c1 & c1(=N)[nH]c(C)c(C#N)c(OC2CCCC2)n1 Is there some way to tailor scoring function components in REINVENT2.0 to generate correct tautomers? Are there updates to REINVENT 3.0 that would help generate correct tautomers? Thank you.

Sagar-Gore avatar Oct 29 '21 09:10 Sagar-Gore

Thanks for bringing this up, it is indeed a good point. I always thought It is a bit tricky to penalize a specific tautomer generation since the molecules are still valid and this may confuse the learning. I cant remember proving this assumption though So it may turn out to be beneficial to have a component that propagates back score for incorrect tautomers. Another alternative is to append certain tautomer patterns in the Custom Alerts component and thus prevent them from appearing. The model will clearly learn that. One could even use a softer penalty than the one used in Custom Alerts (since that one is binary 0 or 1). Third alternative to deal with these molecules is to introduce the implemented tautomer canonicalization from RDKit: http://rdkit.blogspot.com/2020/01/trying-out-new-tautomer.html I believe I had it in earlier versions but it did impact the speed (slightly) and I think there were some other issues that made me reconsider and stick with just a conventional canonicalization. Again, this was a few RDKit versions ago so probably worth trying it out again. Sadly we have a few other items as a top priority so I cant promise it would be addressed right away.

patronov avatar Oct 30 '21 17:10 patronov

Thank you for these suggestions. I shall try to implement these approaches and compare the generated molecules.

Sagar-Gore avatar Nov 01 '21 09:11 Sagar-Gore

Looking forward to hear about the outcome 👍

patronov avatar Nov 01 '21 10:11 patronov