Max Hodak
Max Hodak
cc @dakoner re: our recent emails on zinc properties
I've overhauled the dataset downloading scripts to start getting into this problem. ZINC12 is an ok reasonably large dataset that comes with a bunch of nice properties for us to...
Yes, edited my comment to match.
The column is named `LogP`... it's not clear if it's measured or calculated, but I would have expected it to be `cLogP` if it was calculated. I know a big...
The full list of columns in the ZINC12 file is: ``` - Charge - Desolv_apolar - Desolv_polar - HBA - HBD - LogP - MWT - NRB - SMILES -...
I think one of the decisions to be made is what the ambition of this repo is: are we just reproducing the one paper? Do we want to expand upon...
`ReduceLROnPlateau` wasn't in a numbered release yet which is why I changed it to pulling from git. If they're caught up we can change back.
`smilesparser.py` looks pretty interesting. What direction are you thinking of taking this?
> one-hot encoding things like Cl and [NH+} as a separate token has also worked, I'm not a fan of hand-engineering these, but one thing you might do is try...
Not against increasing the charset size. What you're saying makes a lot of sense. But to the extent we're biasing the input with structure I'd much rather it was extracted...