RepeatMasker Fasta file contains a sequence identifier which is too long

I have received this error when running a genome in RepeatMasker: FastaDB::_cleanIndexAndCompact(): Fasta file contains a sequence identifier which is too long ( max id length = 50 ) at /usr/local/RepeatMasker/RepeatMasker line 1541.

WARNING: Retrying batch ( 9151 ) [ 25,, 58446]...

This then eventually fails

However upon looking at all of the headers in my fasta file, the longest is 46 characters, which is still less than the max id length, so I am wondering why this error has occurred?

Is the true maximum header length less than 50 characters?

Apr 22 '20 11:04 TobyBaril

I made a short test file with a 46-character header and it was fine. In fact, exactly 50 characters is fine and 51 fails.

Can you share a link to your exact fasta file if it's publicly available, and/or post the sequence headers (i.e. the output of grep -E '^>' genome.fa)? It would be helpful to debug if there is some subtle issue with the counting.

Apr 22 '20 17:04 jebrosen

Unfortunately the fasta isn't publicly available, but the headers are here:

https://drive.google.com/file/d/15mNTeTSvi30BZTOdQ6tB9kpl-q1-ovFb/view?usp=sharing

Apr 23 '20 09:04 TobyBaril

Ah, I think I see what's happening now. Long sequences -- longer than I tested for my previous comment -- are split into multiple files and the text frag-<number> is added to the sequence name to keep track. So the effective maximum header length does end up being closer to 40 or 45, for sequences that have to be split up.

Apr 23 '20 17:04 jebrosen

Ah Okay! This makes sense! I have implemented a pre-processing step in my pipeline to rename headers, so this should be fine for me now

Apr 24 '20 09:04 TobyBaril

Fasta file contains a sequence identifier which is too long - with shorter headers and run fails