CheckM2
CheckM2 copied to clipboard
UnicodeEncodeError when running testrun and on real data.
Hi,
I installed checkm2 using the yml file, and downloaded the database without issue.
I get the following unicode encoding error in the testrun, but also when I try it on a small data (3 genomes) of my own real data.
Has anyone seen this error, or have advice on how to fix this?
Thank you, Patricia
patricia@sulfur:~$ mamba env create -n checkm2 -f checkm2.yml
patricia@sulfur:~$ conda activate checkm2
(checkm2) patricia@sulfur:~$ pip install CheckM2
(checkm2) patricia@sulfur:~$ checkm2 -h
____ _ _ __ __ ____
/ ___| |__ ___ ___| | _| \/ |___ \
| | | '_ \ / _ \/ __| |/ / |\/| | __) |
| |___| | | | __/ (__| <| | | |/ __/
\____|_| |_|\___|\___|_|\_\_| |_|_____|
...::: CheckM2 v1.0.1 :::...
General usage:
predict -> Predict the completeness and contamination of genome bins in a folder.
testrun -> Runs Checkm2 on internal test genomes to ensure it runs without errors.
database -> Download and set up required CheckM2 DIAMOND database for annotation
Use checkm2 <command> -h for command-specific help.
(checkm2) patricia@sulfur:~$ checkm2 database --download --path /storage1/data10/databases/checkm2/
[08/13/2024 12:26:42 PM] INFO: Command: Download database. Checking internal path information.
[08/13/2024 12:26:44 PM] INFO: Downloading https://zenodo.org/api/records/5571251/files/checkm2_database.tar.gz/content to /storage1/data10/databases/checkm2/checkm2_database.tar.gz.
100%|###################################################################################| 1.74G/1.74G [01:30<00:00, 19.2MiB/s]
[08/13/2024 12:28:15 PM] INFO: Extracting files from archive...
[08/13/2024 12:28:40 PM] INFO: Verifying version and checksums...
[08/13/2024 12:28:40 PM] INFO: Verification success.
[08/13/2024 12:28:48 PM] INFO: Diamond DATABASE downloaded successfully! Consider running <checkm2 testrun> to verify everything works.
(checkm2) patricia@sulfur:~$ checkm2 testrun
[08/13/2024 12:30:27 PM] INFO: Test run: Running quality prediction workflow on test genomes with 1 threads.
[08/13/2024 12:30:27 PM] INFO: Running checksum on test genomes.
[08/13/2024 12:30:27 PM] INFO: Checksum successful.
[08/13/2024 12:30:29 PM] INFO: Calling genes in 3 bins with 1 threads:
Finished processing 3 of 3 (100.00%) bins.
[08/13/2024 12:30:58 PM] INFO: Calculating metadata for 3 bins with 1 threads:
Finished processing 3 of 3 (100.00%) bin metadata.
[08/13/2024 12:30:59 PM] INFO: Annotating input genomes with DIAMOND using 1 threads
Traceback (most recent call last):
File "/home/patricia/miniconda3/envs/checkm2/bin/checkm2", line 265, in <module>
predictor.prediction_wf(False, 'auto', False, False, False)
File "/home/patricia/miniconda3/envs/checkm2/lib/python3.8/site-packages/checkm2/predictQuality.py", line 135, in prediction_wf
diamond_out = diamond_search.run(prodigal_files)
File "/home/patricia/miniconda3/envs/checkm2/lib/python3.8/site-packages/checkm2/diamond.py", line 119, in run
self.__call_diamond(protein_chunks, diamond_out)
File "/home/patricia/miniconda3/envs/checkm2/lib/python3.8/site-packages/checkm2/diamond.py", line 74, in __call_diamond
sequenceClasses.SeqReader().write_fasta(seq_object, temp_diamond_input.name)
File "/home/patricia/miniconda3/envs/checkm2/lib/python3.8/site-packages/checkm2/sequenceClasses.py", line 104, in write_fasta
fout.write('>' + seqId + '\n')
UnicodeEncodeError: 'latin-1' codec can't encode character '\u03a9' in position 6: ordinal not in range(256)
I am also running into the same error. I am running checkM2 v1.0.2
[08/15/2024 10:18:11 AM] INFO: Annotating input genomes with DIAMOND using 30 threads
Traceback (most recent call last):
File "/home/nala0006/miniconda3/envs/checkm2/bin/checkm2", line 245, in <module>
args.stdout, args.resume, args.remove_intermediates, args.ttable)
File "/home/nala0006/miniconda3/envs/checkm2/lib/python3.6/site-packages/checkm2/predictQuality.py", line 135, in prediction_wf
diamond_out = diamond_search.run(prodigal_files)
File "/home/nala0006/miniconda3/envs/checkm2/lib/python3.6/site-packages/checkm2/diamond.py", line 119, in run
self.__call_diamond(protein_chunks, diamond_out)
File "/home/nala0006/miniconda3/envs/checkm2/lib/python3.6/site-packages/checkm2/diamond.py", line 74, in __call_diamond
sequenceClasses.SeqReader().write_fasta(seq_object, temp_diamond_input.name)
File "/home/nala0006/miniconda3/envs/checkm2/lib/python3.6/site-packages/checkm2/sequenceClasses.py", line 104, in write_fasta
fout.write('>' + seqId + '\n')
UnicodeEncodeError: 'ascii' codec can't encode character '\u03a9' in position 33: ordinal not in range(128)
Can anyone from Checkm2 help us please?