Example with CRISPR output files (piler-cr and CRISPRDetect) produces empty file
Dear authors/developers,
Please find the relevant information for my issue below. Please do not hesitate to ask for more informaiton.
Looking forward to hearing from you.
Expected Behavior
Non-empty file with output file to be produced, similar to that of a regular run.
Current Behavior
Empty output file produced.
Steps to Reproduce (for bugs)
$ rm -rf tmpFolder
$ mkdir tmpFolder
$ spacepharer easy-predict examples/crisprdetect_test examples/pilercr_test output/targetSetDB predictions.tsv tmpFolder
Spacepharer Output (for bugs)
The output file is empty. spacepharer worked when applied to the fasta format spacers. Here is the stdout of the run:
predictions.tsv exists and will be overwritten
easy-predict examples/crisprdetect_test examples/pilercr_test output/targetSetDB predictions.tsv tmpFolder
MMseqs Version: 5.c2e680a
Taxonomy mapping file
NCBI tax dump directory
Substitution matrix nucl:nucleotide.out,aa:VTML40.out
<< Skipped for brevity >>
[=================================================================] 100.00% 2 0s 1ms
Time for merging to predictions.tsv: 0h 0m 1s 512ms
Time for processing: 0h 0m 2s 633ms
Context
I also tested using only piler-cr results such that I make sure there is only one set of CRISPR results being evaluated. Same output.
Your Environment
Include as many relevant details about the environment you experienced the bug in.
- Git commit used (The string after "MMseqs Version:" when you execute SpacePHARER without any parameters):
MMseqs Version: 5.c2e680a
- Which SpacePHARER version was used (Statically-compiled, self-compiled, Conda, etc.):
$ mamba env export
name: spacepharer_env
channels:
- bioconda
- conda-forge
dependencies:
- _libgcc_mutex=0.1=conda_forge
- _openmp_mutex=4.5=2_gnu
- bzip2=1.0.8=hd590300_5
- ca-certificates=2024.2.2=hbcca054_0
- gawk=5.3.0=ha916aea_0
- gettext=0.21.1=h27087fc_0
- gmp=6.3.0=h59595ed_0
- libgcc-ng=13.2.0=h807b86a_5
- libgomp=13.2.0=h807b86a_5
- libidn2=2.3.7=hd590300_0
- libstdcxx-ng=13.2.0=h7e041cc_5
- libunistring=0.9.10=h7f98852_0
- libxcrypt=4.4.36=hd590300_1
- libzlib=1.2.13=hd590300_5
- mpfr=4.2.1=h9458935_0
- ncurses=6.4=h59595ed_2
- openssl=3.2.1=hd590300_0
- perl=5.32.1=7_hd590300_perl5
- readline=8.2=h8228510_1
- spacepharer=5.c2e680a=pl5321h6a68c12_3
- wget=1.20.3=ha35d2d1_1
- zlib=1.2.13=hd590300_5
prefix: /ibex/user/naras0c/conda-environments/spacepharer_env
- For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation: [Not applicable]
- Server specifications (especially CPU support for AVX2/SSE and amount of system memory): [Not applicable]
- Operating system and version:
$ uname -a
Linux cn605-27-r 5.14.0-162.23.1.el9_1.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Apr 11 19:09:37 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Hi!
The different test files in the example folder came from different bacteria genomes, only the one from fasta_test is supposed to get a hit from one of the example phage genomes. You could try searching against a larger target database (for instance spacepharer downloaddb GenBank_phage_2018_09 targetSetDB tmpFolder).
Hope this answers your question.
Hi!
Thanks for the response! Here is what I did:
$ mkdir -p database # Start from a fresh output directory
$ mkdir -p tmpFolder # Create a fresh tmp folder
$ $ spacepharer downloaddb GenBank_phage_2018_09 database/targetSetDB tmpFolder/
downloaddb GenBank_phage_2018_09 database/targetSetDB tmpFolder/
MMseqs Version: 5.c2e680a
Create reversed setdb 1
Threads 40
Verbosity 3
2024-02-24 14:16:09 URL:https://wwwuser.gwdg.de/~compbiol/spacepharer/2018_09/genbank_phages_2018_09.tar [144250880/144250880] -> "genbank_phages_2018_09.tar" [1]
2024-02-24 14:16:10 URL:https://wwwuser.gwdg.de/~compbiol/spacepharer/2018_09/genbank_phages_2018_09.tsv [405478/405478] -> "genbank_phages_2018_09.tsv" [1]
tar2db genbank_phages_2018_09.tar /ibex/user/naras0c/spacepharer_test/tmpFolder/9610124632266045672/tardb --threads 40 -v 3
Time for merging to tardb: 0h 0m 0s 81ms
Time for merging to tardb.lookup: 0h 0m 0s 409ms
Time for processing: 0h 0m 6s 409ms
createdb /ibex/user/naras0c/spacepharer_test/tmpFolder/9610124632266045672/tardb /ibex/user/naras0c/spacepharer_test/tmpFolder/9610124632266045672/seqdb -v 3
Converting sequences
[8283] 1s 195ms
Time for merging to seqdb_h: 0h 0m 0s 383ms
Time for merging to seqdb: 0h 0m 4s 252ms
Database type: Nucleotide
Time for processing: 0h 0m 6s 161ms
createsetdb /ibex/user/naras0c/spacepharer_test/tmpFolder/9610124632266045672/seqdb /ibex/user/naras0c/spacepharer_test/tmpFolder/9610124632266045672 --reverse-fragments 0 --tax-mapping-file genbank_phages_2018_09.tsv --extractorf-spacer 0 --translation-table 1 --add-orf-stop 0 --compressed 0 --threads 40 -v 3
cp: '/ibex/user/naras0c/spacepharer_test/tmpFolder/9610124632266045672/seqdb' and '/ibex/user/naras0c/spacepharer_test/tmpFolder/9610124632266045672/seqdb' are the same file
Error: createsetdb failed
Perhaps I am doing something wrong here..?
EDIT/UPDATE
I tried to issue the createsetdb by myself as follows:
$ tar xvf tmpFolder/9610124632266045672/genbank_phages_2018_09.tar
< stdout of unpacking the fna.gz files >
$ mkdir phages # Create and move them to a different folder
$ mv *.fna.gz phages/
$ rm -rf databases/* # Clean up output directory
$ spacepharer createsetdb phages/*.fna.gz databases/targetSetDb tmpFolder/
< bunch of stdout >
$ spacepharer createsetdb phages/*.fna.gz databases/targetSetDb_rev tmpFolder/ --reverse-fragments 1
< bunch of stdout >
This seems to have worked with no issue. So, I proceeded to run the command that I wanted to run with the example CRISPR data:
$ spacepharer easy-predict examples/crisprdetect_test examples/pilercr_test databases/targetSetDb predictions.tsv tmpFolder
< bunch of stdout >
The predictions file is not empty this time around.
Let me know if you need more information :)