foldcomp
foldcomp copied to clipboard
ESMFold database header issues
When I extract FASTA from highquality_clust30 I receive the following headers.
>ESMFOLD V0 PREDICTION FOR MGYP000138429313
>ESMFOLD V0 PREDICTION FOR MGYP001595280761
...
I use FoldComp for a downstream application, and per FASTA specification in this case each sequence will have a header ESMFOLD, which is not unique. The unique id is stored in the comment.
I can run sed on it, but this solution feels hacky.
The highquality_clust30.lookup looks appropriate:
0 MGYP002174220927 0
1 MGYP000064029927 0
Do you have recommendations on how to get proper FASTA headers?
Cheers V
Sorry for the late response. I've changed the default to use id/filename when extracting sequences in 412c7a8 and introduced use-title flag if title is needed.