spacepharer icon indicating copy to clipboard operation
spacepharer copied to clipboard

Example with CRISPR output files (piler-cr and CRISPRDetect) produces empty file

Open shaman-narayanasamy opened this issue 1 year ago • 2 comments

Dear authors/developers,

Please find the relevant information for my issue below. Please do not hesitate to ask for more informaiton.

Looking forward to hearing from you.

Expected Behavior

Non-empty file with output file to be produced, similar to that of a regular run.

Current Behavior

Empty output file produced.

Steps to Reproduce (for bugs)

$ rm -rf tmpFolder
$ mkdir tmpFolder
$ spacepharer easy-predict examples/crisprdetect_test examples/pilercr_test output/targetSetDB predictions.tsv tmpFolder

Spacepharer Output (for bugs)

The output file is empty. spacepharer worked when applied to the fasta format spacers. Here is the stdout of the run:

predictions.tsv exists and will be overwritten
easy-predict examples/crisprdetect_test examples/pilercr_test output/targetSetDB predictions.tsv tmpFolder 

MMseqs Version:                        	5.c2e680a
Taxonomy mapping file                  	
NCBI tax dump directory                	
Substitution matrix                    	nucl:nucleotide.out,aa:VTML40.out
<< Skipped for brevity >>
[=================================================================] 100.00% 2 0s 1ms
Time for merging to predictions.tsv: 0h 0m 1s 512ms
Time for processing: 0h 0m 2s 633ms

Context

I also tested using only piler-cr results such that I make sure there is only one set of CRISPR results being evaluated. Same output.

Your Environment

Include as many relevant details about the environment you experienced the bug in.

  • Git commit used (The string after "MMseqs Version:" when you execute SpacePHARER without any parameters):
MMseqs Version:                        	5.c2e680a
  • Which SpacePHARER version was used (Statically-compiled, self-compiled, Conda, etc.):
$ mamba env export
name: spacepharer_env
channels:
  - bioconda
  - conda-forge
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=2_gnu
  - bzip2=1.0.8=hd590300_5
  - ca-certificates=2024.2.2=hbcca054_0
  - gawk=5.3.0=ha916aea_0
  - gettext=0.21.1=h27087fc_0
  - gmp=6.3.0=h59595ed_0
  - libgcc-ng=13.2.0=h807b86a_5
  - libgomp=13.2.0=h807b86a_5
  - libidn2=2.3.7=hd590300_0
  - libstdcxx-ng=13.2.0=h7e041cc_5
  - libunistring=0.9.10=h7f98852_0
  - libxcrypt=4.4.36=hd590300_1
  - libzlib=1.2.13=hd590300_5
  - mpfr=4.2.1=h9458935_0
  - ncurses=6.4=h59595ed_2
  - openssl=3.2.1=hd590300_0
  - perl=5.32.1=7_hd590300_perl5
  - readline=8.2=h8228510_1
  - spacepharer=5.c2e680a=pl5321h6a68c12_3
  - wget=1.20.3=ha35d2d1_1
  - zlib=1.2.13=hd590300_5
prefix: /ibex/user/naras0c/conda-environments/spacepharer_env
  • For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation: [Not applicable]
  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory): [Not applicable]
  • Operating system and version:
$ uname -a
Linux cn605-27-r 5.14.0-162.23.1.el9_1.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Apr 11 19:09:37 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

shaman-narayanasamy avatar Feb 23 '24 11:02 shaman-narayanasamy

Hi! The different test files in the example folder came from different bacteria genomes, only the one from fasta_test is supposed to get a hit from one of the example phage genomes. You could try searching against a larger target database (for instance spacepharer downloaddb GenBank_phage_2018_09 targetSetDB tmpFolder). Hope this answers your question.

RuoshiZhang avatar Feb 23 '24 13:02 RuoshiZhang

Hi!

Thanks for the response! Here is what I did:

$ mkdir -p database # Start from a fresh output directory
$ mkdir -p tmpFolder # Create a fresh tmp folder
$ $ spacepharer downloaddb GenBank_phage_2018_09 database/targetSetDB tmpFolder/
downloaddb GenBank_phage_2018_09 database/targetSetDB tmpFolder/ 

MMseqs Version:         5.c2e680a
Create reversed setdb   1
Threads                 40
Verbosity               3

2024-02-24 14:16:09 URL:https://wwwuser.gwdg.de/~compbiol/spacepharer/2018_09/genbank_phages_2018_09.tar [144250880/144250880] -> "genbank_phages_2018_09.tar" [1]
2024-02-24 14:16:10 URL:https://wwwuser.gwdg.de/~compbiol/spacepharer/2018_09/genbank_phages_2018_09.tsv [405478/405478] -> "genbank_phages_2018_09.tsv" [1]
tar2db genbank_phages_2018_09.tar /ibex/user/naras0c/spacepharer_test/tmpFolder/9610124632266045672/tardb --threads 40 -v 3 

Time for merging to tardb: 0h 0m 0s 81ms
Time for merging to tardb.lookup: 0h 0m 0s 409ms
Time for processing: 0h 0m 6s 409ms
createdb /ibex/user/naras0c/spacepharer_test/tmpFolder/9610124632266045672/tardb /ibex/user/naras0c/spacepharer_test/tmpFolder/9610124632266045672/seqdb -v 3 

Converting sequences
[8283] 1s 195ms
Time for merging to seqdb_h: 0h 0m 0s 383ms
Time for merging to seqdb: 0h 0m 4s 252ms
Database type: Nucleotide
Time for processing: 0h 0m 6s 161ms
createsetdb /ibex/user/naras0c/spacepharer_test/tmpFolder/9610124632266045672/seqdb  /ibex/user/naras0c/spacepharer_test/tmpFolder/9610124632266045672 --reverse-fragments 0 --tax-mapping-file genbank_phages_2018_09.tsv --extractorf-spacer 0 --translation-table 1 --add-orf-stop 0 --compressed 0 --threads 40 -v 3 

cp: '/ibex/user/naras0c/spacepharer_test/tmpFolder/9610124632266045672/seqdb' and '/ibex/user/naras0c/spacepharer_test/tmpFolder/9610124632266045672/seqdb' are the same file
Error: createsetdb failed

Perhaps I am doing something wrong here..?

EDIT/UPDATE

I tried to issue the createsetdb by myself as follows:

$ tar xvf tmpFolder/9610124632266045672/genbank_phages_2018_09.tar
< stdout of unpacking the fna.gz files >

$ mkdir phages # Create and move them to a different folder
$ mv *.fna.gz phages/
$ rm -rf databases/* # Clean up output directory
$ spacepharer createsetdb phages/*.fna.gz databases/targetSetDb tmpFolder/
< bunch of stdout >
$ spacepharer createsetdb phages/*.fna.gz databases/targetSetDb_rev tmpFolder/ --reverse-fragments 1
< bunch of stdout >

This seems to have worked with no issue. So, I proceeded to run the command that I wanted to run with the example CRISPR data:

$ spacepharer easy-predict examples/crisprdetect_test examples/pilercr_test databases/targetSetDb predictions.tsv tmpFolder
< bunch of stdout >

The predictions file is not empty this time around.

Let me know if you need more information :)

shaman-narayanasamy avatar Feb 24 '24 11:02 shaman-narayanasamy