MMseqs2 icon indicating copy to clipboard operation
MMseqs2 copied to clipboard

Null bite '^@' in convertalis results

Open stas-malavin opened this issue 1 year ago • 1 comments

Hello,

Current Behavior

When I include qaln and taln fields in my convertalis results, I occasionally get a null bite (in vim it looks like ^@), which breaks a line in the downstream analysis. See attached screenshots in Vim and Libreoffice, the tsv itself also attached (truncated, see lines 89–90). issue-libreoffice issue-vim mmseqs-issue.tsv.zip

Steps to Reproduce (for bugs)

# Create a taxonomyDB from PR2 database
# Get 'pr2_version_5.0.0_SSU_taxo_long.fasta' here: https://github.com/pr2database/pr2database/releases/download/v5.0.0/pr2_version_5.0.0_SSU_taxo_long.fasta.gz)
# Uses seqkit and taxonkit programs (github.com/shenwei356)

sed 's/_/ /' pr2_version_5.0.0_SSU_taxo_long.fasta > pr2_to_mmseqs.fasta
echo -e "domain\tsupergroup\tdivision\tsubdivision\tclass\torder\tfamily\tgenus\tspecies" > pr2_tax
cat pr2_to_mmseqs.fasta | seqkit seq -n | sed 's/[^|]*|[^|]*|[^|]*|//' | sed 's/|/\t/g' >> pr2_tax
cat pr2_tax | taxonkit create-taxdump -O pr2_taxdump
cat pr2_to_mmseqs.fasta | seqkit seq -n | sed 's/[^|]*|[^|]*|[^|]*|//' | cut -d '|' -f 9 | taxonkit name2taxid --data-dir pr2_taxdump > pr2_name2taxid
paste <(cat pr2_to_mmseqs.fasta | seqkit seq -ni) <(cat pr2_name2taxid | cut -f2) > pr2_taxidmap
mmseqs createdb pr2_to_mmseqs.fasta pr2
mmseqs createtaxdb pr2 tmp --ncbi-tax-dump pr2_taxdump --tax-mapping-file pr2_taxidmap
mmseqs createindex pr2

# Search
mmseqs prefilter lcB1_97 pr2.idx lcB1_97_pref -s 2.0 --exact-kmer-matching 1 --max-seq-len 500 -c 0.9 --cov-mode 2
mmseqs align lcB1_97 pr2.idx lcB1_97_pref lcB1_97_aln -a 1 -e 1e-30 --alignment-mode 3 --alignment-output-mode 0 --min-aln-len 130 -c 0.9 --cov-mode 2 --max-seq-len 500 --max-accept 30
mmseqs filterdb lcB1_97_aln lcB1_97_aln_flt --beats-first --filter-column 4 --comparison-operator le
mmseqs convertalis lcB1_97 pr2 lcB1_97_aln_flt lcB1_97_aln_flt~pr2.tsv --format-output query,theader,evalue,pident,cigar,qaln,taln --search-type 3

MMseqs Output (for bugs)

The output of the convertalis command is normal.

$ mmseqs convertalis lcB1_97 /media/bioinf/Data12/MMSEQS-DB/PR2/pr2 lcB1_97_aln_flt lcB1_97_aln_flt~pr2.tsv --format-output 'query,theader,evalue,pident,cigar,qaln,taln' --search-type 3
lcB1_97_aln_flt~pr2.tsv exists and will be overwritten
convertalis lcB1_97 /media/bioinf/Data12/MMSEQS-DB/PR2/pr2 lcB1_97_aln_flt lcB1_97_aln_flt~pr2.tsv --format-output query,theader,evalue,pident,cigar,qaln,taln --search-type 3 

MMseqs Version:        	8799829d213f31b647fc69e0572a0c828c5aaf63
Substitution matrix    	aa:blosum62.out,nucl:nucleotide.out
Alignment format       	0
Format alignment output	query,theader,evalue,pident,cigar,qaln,taln
Translation table      	1
Gap open cost          	aa:11,nucl:5
Gap extension cost     	aa:1,nucl:2
Database output        	false
Preload mode           	0
Search type            	3
Threads                	64
Compressed             	0
Verbosity              	3

[=================================================================] 100.00% 61.99K 0s 133ms    
Time for merging to lcB1_97_aln_flt~pr2.tsv: 0h 0m 0s 35ms
Time for processing: 0h 0m 0s 321ms

Your Environment

Include as many relevant details about the environment you experienced the bug in.

  • Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters):
$ mmseqs version
8799829d213f31b647fc69e0572a0c828c5aaf63
  • Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.): self-compiled
  • For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation:
$ cmake --version
cmake version 3.22.1
$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
$ cat /proc/cpuinfo [truncated to 1 core]
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
stepping        : 4
microcode       : 0x2007006
cpu MHz         : 2100.000
cache size      : 22528 KB
physical id     : 0
siblings        : 32
core id         : 0
cpu cores       : 16
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d arch_capabilities
vmx flags       : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_mode_based_exec tsc_scaling
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit mmio_stale_data
bogomips        : 4200.00
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

$ free
               total        used        free      shared  buff/cache   available
Mem:       791192240    20105804    63519620       33656   707566816   765066152
Swap:              0           0           0
  • Operating system and version:
$ cat /etc/*release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"
PRETTY_NAME="Ubuntu 22.04.3 LTS"
<truncated>

stas-malavin avatar Apr 15 '24 16:04 stas-malavin

same error to me, I really need help!

ekvbmipdn0811 avatar Dec 04 '25 07:12 ekvbmipdn0811