deep-rules
deep-rules copied to clipboard
Unsupervised learning for protein sequences
Have you checked the list of proposed tips to see if the tip has already been proposed?
- [x] Yes
Did you add yourself as a contributor by making a pull request if this is your first contribution?
- [x] Yes, I added myself or am already a contributor
There has been a fair amount of discussion on Twitter the past few days about how to properly evaluate deep learning models that learn representations of protein sequences. This may provide good examples for how to evaluate models. For reference:
- https://twitter.com/larsjuhljensen/status/1124983525873156096
- Comments on https://doi.org/10.1101/622803
I haven't looked at these papers in particular, but it reminds me of related discussions in biochemistry like https://doi.org/10.1021/acs.jcim.7b00403. In that domain, there are pitfalls when dataset splits do not account for chemical similarity.