RepeatMasker icon indicating copy to clipboard operation
RepeatMasker copied to clipboard

Error Configuring the new version of RM

Open ohan-Bioinfo opened this issue 5 years ago • 12 comments

Dear, Greetings

The new version of RepeatMasker Causes a lot of trouble for me.

When I installed the new format library from Dfam.h5 and repbase, then run configure a lot of warning occurs:

./configure
 -- Setting Perl interpreter...

RepeatMasker Configuration Program


Checking for libraries...

Rebuilding RepeatMaskerLib.h5 master library
  Merging Dfam + RepBase into RepeatMaskerLib.h5 library..............................................................................................WARNING:__main__:Could not find taxon for 'cryptococcus_basidiomycete_yeast'
WARNING:__main__:Could not find taxon for 'cetartiodactyla'
WARNING:__main__:Could not find taxon for 'cetartiodactyla'
WARNING:__main__:Could not find taxon for 'cetartiodactyla'
WARNING:__main__:Could not find taxon for 'cetartiodactyla'
WARNING:__main__:Could not find taxon for 'cetartiodactyla'
WARNING:__main__:Could not find taxon for 'drosophila_fruit_fly_genus'
WARNING:__main__:Could not find taxon for 'drosophila_fruit_fly_genus'
WARNING:__main__:Could not find taxon for 'drosophila_fruit_fly_genus'
WARNING:__main__:Could not find taxon for 'drosophila_fruit_fly_genus'

After the warning completed and start giving the Tool paths, an error occurred :

Add a Search Engine:
   1. Crossmatch: [ Un-configured ]
   2. RMBlast: [ Configured, Default ]
   3. HMMER3.1 & DFAM: [ Un-configured ]
   4. ABBlast: [ Un-configured ]

   5. Done


Enter Selection: 5
Building FASTA version of RepeatMasker.lib ....Traceback (most recent call last):
  File "/Users/ohon_ad/Documents/Tools/RepeatMasker/famdb.py", line 1738, in <module>
    main()
  File "/Users/ohon_ad/Documents/Tools/RepeatMasker/famdb.py", line 1732, in main
    args.func(args)
  File "/Users/ohon_ad/Documents/Tools/RepeatMasker/famdb.py", line 1587, in command_families
    print_families(args, families, True, target_id)
  File "/Users/ohon_ad/Documents/Tools/RepeatMasker/famdb.py", line 1526, in print_families
    entry += family.to_fasta(
  File "/Users/ohon_ad/Documents/Tools/RepeatMasker/famdb.py", line 394, in to_fasta
    for clade_id in self.clades:
TypeError: 'NoneType' object is not iterable
.
Building RMBlast frozen libraries..
The program is installed with a the following repeat libraries:
Database: Dfam withRBRM
Version: 3.2
Date: 2020-07-02

Dfam - A database of transposable element (TE) sequence alignments and HMMs.
RBRM - RepBase RepeatMasker Edition - version 20181026

Total consensus sequences: 318520
Total HMMs: 273655


Further documentation on the program may be found here:
  /Users/ohon_ad/Documents/Tools/RepeatMasker/repeatmasker.help

And when I try to use queryRepeatDatabase.pl script to extract sequences belong to specific species another error Could you please advice?

./queryRepeatDatabase.pl -species X 
queryRepeatDatabase
===================
RepeatMasker Database: RepeatMaskerLib.embl
RepeatMasker Combined Database: Dfam_3.1, RepBase-20181026
Taxonomy::new() needs a path for a famdb file!
 at ./queryRepeatDatabase.pl line 267.

ohan-Bioinfo avatar Oct 14 '20 09:10 ohan-Bioinfo

From the error messages it looks like you have files from multiple versions of RepeatMasker mixed together - in this case, RMRBMeta.embl should have been updated and queryRepeatDatabase.pl should have been removed. We recommend in the installation instructions to extract RepeatMasker into a new directory instead of a pre-existing one to avoid issues such as these. Please let us know if installing in a new directory helps or not - I think it should fix both the warnings and the errors from configure.

And when I try to use queryRepeatDatabase.pl script to extract sequences belong to specific species another error Could you please advice?

The features of queryRepeatDatabase.pl have been replaced with famdb.py in RepeatMasker 4.1.1. Here is an equivalent command for famdb.py:

./famdb.py -i Libraries/RepeatMaskerLib.embl families --ancestors --descendants X --format fasta_name

jebrosen avatar Oct 14 '20 17:10 jebrosen

Thank you for your clarification.

I rebuild and configured it at a new location, and follow the instructions related to the libraries, A different error occurred kindly have a look.

Enter Selection: 5
Building FASTA version of RepeatMasker.lib ...ERROR:__main__:Error reading file: Unable to open file (unable to open file: name = '/Users/ohon_ad/Documents/Tools/RepeatMasker/RepeatMasker_4.1.1/Libraries/RepeatMaskerLib.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

ohan-Bioinfo avatar Oct 15 '20 09:10 ohan-Bioinfo

Hm, what do you get from this command?

ls -l /Users/ohon_ad/Documents/Tools/RepeatMasker/RepeatMasker_4.1.1/Libraries/RepeatMaskerLib.h5

jebrosen avatar Oct 15 '20 15:10 jebrosen

ls: /Users/ohon_ad/Documents/Tools/RepeatMasker/RepeatMasker_4.1.1/Libraries/RepeatMaskerLib.h5: No such file or directory

ohan-Bioinfo avatar Oct 20 '20 10:10 ohan-Bioinfo

Libraries % ls
Artefacts.embl			RMRBMeta.embl			RepeatMaskerLib.h5.writing	RepeatPeps.lib.pin		RepeatPeps.lib.pto
README.RMRBSeqs			RMRBSeqs.embl			RepeatPeps.lib			RepeatPeps.lib.pot		RepeatPeps.readme
README.meta			RepeatAnnotationData.pm		RepeatPeps.lib.pdb		RepeatPeps.lib.psq		taxonomy.dat
RMRB.embl			RepeatMasker.lib		RepeatPeps.lib.phr		RepeatPeps.lib.ptf

ohan-Bioinfo avatar Oct 20 '20 10:10 ohan-Bioinfo

RepeatMaskerLib.h5.writing

This file should have been deleted once the RepBase RepeatMasker Edition file was completely merged into the main library file, so it looks like something went wrong there.

Can you show the output of running ./addRepBase.pl in the RepeatMasker_4.1.1/ directory?

jebrosen avatar Oct 20 '20 22:10 jebrosen

sudo ./addRepBase.pl
Rebuilding RepeatMaskerLib.h5 master library
  Merging Dfam + RepBase into RepeatMaskerLib.h5 library...Failed to copy /Users/ohon_ad/Documents/Tools/RepeatMasker/RepeatMasker_4.1.1/Libraries/Dfam.h5 to /Users/ohon_ad/Documents/Tools/RepeatMasker/RepeatMasker_4.1.1/Libraries/RepeatMaskerLib.h5.writing.
Is the source file missing, or is your user missing write permissions to the directory?

ohan-Bioinfo avatar Oct 21 '20 06:10 ohan-Bioinfo

Where did you get RepeatMasker from? The download should have included the Dfam.h5 file.

jebrosen avatar Oct 21 '20 15:10 jebrosen

from here: http://www.repeatmasker.org/RepeatMasker/ RepeatMasker main page.

ohan-Bioinfo avatar Oct 22 '20 09:10 ohan-Bioinfo

That is strange. Maybe the files did not download or extract completely?

You can check if the files are complete/uncorrupted with md5sum:

$ md5sum RepeatMasker-4.1.1.tar.gz
6df7b188757b5ef2d2575320eb0b014e  RepeatMasker-4.1.1.tar.gz

$ md5sum RepeatMasker-4.1.1/Libraries/Dfam.h5
adc805eda1f5b8a6d24a32e10befeff5  RepeatMasker-4.1.1/Libraries/Dfam.h5

jebrosen avatar Oct 26 '20 17:10 jebrosen

I'm back using the old version, for now, I do have a question regarding the script buildSummary I used it while ago when the annotation run assumed by default Homosepian, but I managed to change the annotation table .tbl into general vertebrates but now when I try the same method I get a long table, not like the annotation output from RM .tbl Would you please advice if I want the .tbl output to be related to my species such as plant

buildSummary.pl -species Palmae rmrun.out > mysummary.tbl

the mysummary.tbl is different from what I want

ohan-Bioinfo avatar Nov 05 '20 11:11 ohan-Bioinfo

buildSummary.pl does not have any species-specific output formatting options; -species is used to describe repeats as "lineage-specific" vs "ancestral". Are you looking for the output format of ProcessRepeats? ProcessRepeats has different headings/groupings for "mammal", "mouse", "primates", or "anything else", depending on what was specified for -species (or -lib) for RepeatMasker or ProcessRepeats.

jebrosen avatar Nov 09 '20 18:11 jebrosen