Any limit on the input protein length for ESM C?
Thank you for your great work.
I am currently using the ESM C model to generate protein embeddings.
I want to know if there is a maximum sequence length that the model can handle?
Thank you for your assistance!
According to the blog post it looks like 2048
Training stages. ESM C is trained in two stages: Stage 1: For the first 1 million steps, the model uses a context length of 512, with metagenomic data constituting 64% of the training dataset. Stage 2: In the final 500,000 steps, the context length is increased to 2048, and the proportion of metagenomic data is reduced to 37.5%.
2048 is correct, the correctness probably drops dramatically as you increase the length past that.