ColabFold icon indicating copy to clipboard operation
ColabFold copied to clipboard

Feature request: sequence from uniprot id and residue numbers

Open gezmi opened this issue 3 years ago • 2 comments

Hi,

These notebooks are super cool, thanks so much for the continuous updates. I have one feature request: pulling in sequences from UniProt ID-s, preferably also cutting needed residues. This would be so helpful not having to go to uniprot, finding residue numbers and copy-pasting needed sequences. One possible format could be e.g. P12004/23-45:P99999

Or maybe should I do it and make a pull request with this feature?

Thank you!

Julia

gezmi avatar Jul 06 '22 07:07 gezmi

This looks like a great idea. It could be quite easy implementable if we do not use Uniprot ids but instead sequences.

MAVAVATAAAY....W/23-45:MGVGAWLI...

Avoiding a direct interface to Uniprot would be probably good. If not we might need to support multiple more databases. What do you think about it?

martin-steinegger avatar Jul 06 '22 08:07 martin-steinegger

That could also work. I understand that if you support one, people will probably start to ask for more. But having only to copy-paste a sequence and not finding the beginning-end is already a great help!

If you decide for supporting uniprot (and/or other databases) I already wrote a function to download a sequence from UniProt, based on the AC. I can share that and could also help with interfacing with others.

gezmi avatar Jul 06 '22 12:07 gezmi