Max Hodak

Results 29 comments of Max Hodak

cc @dakoner re: our recent emails on zinc properties

I've overhauled the dataset downloading scripts to start getting into this problem. ZINC12 is an ok reasonably large dataset that comes with a bunch of nice properties for us to...

Yes, edited my comment to match.

The column is named `LogP`... it's not clear if it's measured or calculated, but I would have expected it to be `cLogP` if it was calculated. I know a big...

The full list of columns in the ZINC12 file is: ``` - Charge - Desolv_apolar - Desolv_polar - HBA - HBD - LogP - MWT - NRB - SMILES -...

I think one of the decisions to be made is what the ambition of this repo is: are we just reproducing the one paper? Do we want to expand upon...

`ReduceLROnPlateau` wasn't in a numbered release yet which is why I changed it to pulling from git. If they're caught up we can change back.

`smilesparser.py` looks pretty interesting. What direction are you thinking of taking this?

> one-hot encoding things like Cl and [NH+} as a separate token has also worked, I'm not a fan of hand-engineering these, but one thing you might do is try...

Not against increasing the charset size. What you're saying makes a lot of sense. But to the extent we're biasing the input with structure I'd much rather it was extracted...