Instructions on building the latest RepBase database
What do you want to know? Where can I find the instructions to build/use the latest RepBase26.05 with RepeatMasker?
- Have you installed RepBase RepeatMasker Edition for RepeatMasker? I am using the version from conda. repeatmasker 4.1.1 pl526_1
Where can I find the instructions to build/use the latest RepBase26.05 with RepeatMasker?
If you do not necessarily need the latest version of RepBase, you can still find instructions at http://www.repeatmasker.org/RepeatMasker/ for installation of the latest RepBase RepeatMasker Edition, version 20181026. We have not tested this with a bioconda installation, and this may not work well. We also have instructions for setting up and using a separate customized Libraries/ directory at https://github.com/Dfam-consortium/TETools/#using-repbase-repeatmasker-edition for the Dfam TE Tools container, which I do expect to work fine with a bioconda installation.
Another option is to filter the latest RepBase library to the species/clades corresponding to your target genome, format them as FASTA, and use RepeatMasker's -lib option. However, by doing this you may lose out on some of RepeatMasker's enhancements which improve repeat annotation in some species, especially humans, mice, and mammals.
I am sorry to say there are no instructions for updating to a recent version of RepBase while still preserving the other improvements. Historically, our group has maintained RepeatMasker while also making contributions to RepBase and maintaining the "RepBase RepeatMasker Edition" (see also https://github.com/rmhubley/RepeatMasker/issues/16#issuecomment-480343702 ). This was not a small task; it involved a significant amount of expert-guided reconciliation and quality control to keep up with ever-changing nomenclature and to preserve RepeatMasker-specific enhancements mentioned above.
Unfortunately the agreement under which we were able to perform these updates is not applicable anymore, and pursuing this model further is not a priority of ours. Instead we have been focusing our efforts on our Dfam database, an open-access resource which is included with RepeatMasker. Dfam has fewer well-curated families in fewer organisms than RepBase today, but we have been working hard over the last several years to integrate more data sets and this situation should only improve with time.
I hope this helps answer your question. Please do let us know if you encounter problems with any of the suggestions I gave above.