possible to re-do clustering without repeating ANI calcs?
I see that you can split primary and secondary clustering into 2 steps, but I was wondering if I'm missing a way to just re-run the secondary clustering without re-doing all of the fastANI calculations. For example, if I want to compare average vs single linkage, is there a way to use the existing ANI calculations? I guess I had assumed that the ANI comparison is the slow step so re-doing hierarchical clustering with a different algorithm would be easy, but maybe this is not true. Thanks for your advice, and for supporting this great software.
Hi @rrohwer - yes, if you just run dRep again with the same output directory, it will re-load all the comparisons that have been done and run much more quickly
Thank you! perfect!
Hi @MrOlm , sorry I closed this before I re-ran it and I still have a question! I thought it would just over-write files, but it seems like I need to move or delete some in order to have it recluster. I'm just not sure which ones it needs (so which to not delete). For example, can I just delete or rename data_tables folder, or for example is it using the ndb file for the ANI info?
Here in my error output file I see it stopped after encountering some existing files:
***************************************************
..:: dRep dereplicate Step 1. Filter ::..
***************************************************
NOTE: Wdb already exists! This will not be filtered! Be sure you know what you're doing
Both Bdb and a genome list are found- either don't include a genome list or start a new work directory!
And here in one of the data_tables files I can see that it was not overwritten (because clustering is still "average" instead of "single":
$ head -n 3 data_tables/Cdb.csv
genome,secondary_cluster,threshold,cluster_method,comparison_algorithm,primary_cluster
3300042997_100.fna,1_1,0.040000000000000036,average,fastANI,1
3300044715_55.fna,1_1,0.040000000000000036,average,fastANI,1
And just for reference here is how I ran it the first time, and how I re-ran it (so I literally only changed adding the --clusterAlg single flag. When I re-ran I copied out the old data_tables folder to save it elsewhere, but copied instead of moved so that I left all the first run's files there.
dRep dereplicate drep_all_80_96 -p 128 -g paths_table_for_drep_all_80_96.txt -comp 50 -con 10 --genomeInfo checkM_table_for_drep_all_80_96.csv --S_algorithm fastANI --P_ani 0.8 --S_ani 0.96 --multiround_primary_clustering --skip_plots
dRep dereplicate drep_all_80_96 -p 128 -g paths_table_for_drep_all_80_96.txt -comp 50 -con 10 --genomeInfo checkM_table_for_drep_all_80_96.csv --S_algorithm fastANI --P_ani 0.8 --S_ani 0.96 --clusterAlg single --multiround_primary_clustering --skip_plots
Hi @rrohwer - you could either 1) delete / move all the files in the data_tables folder, 2) just delete Bdb.csv (that's the only one that is causing a problem), or 3) remove -g paths_table_for_drep_all_80_96.tx from your commend not not delete any of the files in the data_tables folder. All will work!
Best, Matt
Hi Matt,
It's still not working for me. First I tried removing the -g argument, and I assume it was re-doing the full run because my job timed out on the server. So I made a smaller test-set of genomes to test on a local install.
However, I tried running it several ways and it seems like it is still re-doing both the mash clustering and fastANI pairwise ANI comparisons.
- first I tried completely removing
data_tablesfolder - then I tried leaving
data_tablesfolder and removing the-gargument - then I tried deleting just
data_tables/Bdb.csv(and still including-g)
For all of these, the program took the same amount of time (just 2 minutes on the test data, but still, it wasn't faster with only re-clustering). And in the terminal output and logger files it seems to list repeating both mash and fastANI calculations.
Thanks so much for your help! Below are details from each run.
Robin
First I simply removed the data_tables folder before re-running and I am pretty sure dRep is still re-running the whole process, including mash clustering and fastANI ANI comparisons.
Here are the commands I used:
# (there are 125 genomes here, expecting ~5 primary clust, ~ 5 secondary/primary clust, and ~ 5 genomes/secondary clust.)
# first I ran with average linkage specified
dRep dereplicate drep_ouput -p 50 -g paths.txt -comp 50 -con 10 --genomeInfo checkM.csv --S_algorithm fastANI --P_ani 0.8 --S_ani 0.96 --clusterAlg average --multiround_primary_clustering --skip_plots
# then I removed the data_tables folder and re-ran with single linkage specified
mv drep_ouput/data_tables data_tables_80_96_average
dRep dereplicate drep_ouput -p 50 -g paths.txt -comp 50 -con 10 --genomeInfo checkM.csv --S_algorithm fastANI --P_ani 0.8 --S_ani 0.96 --clusterAlg single --multiround_primary_clustering --skip_plots
I see no errors and I can see that the new round has single linkage clusters in it, and no warnings.txt file was generated. But,
- Both times it took 2 minutes, so the second time was not shorter.
- And, in both the terminal output and in the log files it looks like it re-ran all of the ANI comparisons and not just the clustering part:
# terminal output from single linkage run that shouldn't have repeated ANI calculations
***************************************************
..:: dRep dereplicate Step 1. Filter ::..
***************************************************
Will filter the genome list
Loading genomes from a list
125 genomes were input to dRep
Calculating genome info of genomes
100.00% of genomes passed length filtering
100.00% of genomes passed checkM filtering
***************************************************
..:: dRep dereplicate Step 2. Cluster ::..
***************************************************
Running primary clustering
Running pair-wise MASH clustering
5 primary clusters made
Running secondary clustering
Running 3125 fastANI comparisons- should take ~ 3.8 min
Step 4. Return output
***************************************************
..:: dRep dereplicate Step 3. Choose ::..
***************************************************
Loading work directory
***************************************************
..:: dRep dereplicate Step 4. Evaluate ::..
***************************************************
will produce Widb (winner information db)
Winner database saved to /home/rrohwer/test_drep_reclustering/drep_ouputdata_tables/Widb.csv
***************************************************
..:: dRep dereplicate Step 5. Analyze ::..
***************************************************
making plots
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
..:: dRep dereplicate finished ::..
Dereplicated genomes................. /home/rrohwer/test_drep_reclustering/drep_ouput/dereplicated_genomes/
Dereplicated genomes information..... /home/rrohwer/test_drep_reclustering/drep_ouput/data_tables/Widb.csv
Figures.............................. /home/rrohwer/test_drep_reclustering/drep_ouput/figures/
Warnings............................. /home/rrohwer/test_drep_reclustering/drep_ouput/log/warnings.txt
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
So next I also re-tried removing the -g argument:
# First I re-ran it fresh (with average linkage)
rm -rf drep_ouput
dRep dereplicate drep_ouput -p 50 -g paths.txt -comp 50 -con 10 --genomeInfo checkM.csv --S_algorithm fastANI --P_ani 0.8 --S_ani 0.96 --clusterAlg average --multiround_primary_clustering --skip_plots
# Then I re-ran it with the data_tables folder left in place but without -g (with single linkage)
dRep dereplicate drep_ouput -p 50 -comp 50 -con 10 --genomeInfo checkM.csv --S_algorithm fastANI --P_ani 0.8 --S_ani 0.96 --clusterAlg single --multiround_primary_clustering --skip_plots
But again although I see that it calculated single linkage, it also still took 2 minutes, and in the terminal output and the logfile it looks like it re-ran clustering:
# terminal output for re-running with single-linkage by removing the -g paths.txt argument
***************************************************
..:: dRep dereplicate Step 1. Filter ::..
***************************************************
NOTE: Wdb already exists! This will not be filtered! Be sure you know what you're doing
NOTE: Clustering already exists! This will not be filtered! Be sure you know what you're doing
Will filter Bdb
125 genomes were input to dRep
Calculating genome info of genomes
100.00% of genomes passed length filtering
100.00% of genomes passed checkM filtering
***************************************************
..:: dRep dereplicate Step 2. Cluster ::..
***************************************************
Running primary clustering
Running pair-wise MASH clustering
5 primary clusters made
Running secondary clustering
Running 3125 fastANI comparisons- should take ~ 3.8 min
Step 4. Return output
***************************************************
..:: dRep dereplicate Step 3. Choose ::..
***************************************************
Loading work directory
***************************************************
..:: dRep dereplicate Step 4. Evaluate ::..
***************************************************
will produce Widb (winner information db)
Winner database saved to /home/rrohwer/test_drep_reclustering/drep_ouputdata_tables/Widb.csv
***************************************************
..:: dRep dereplicate Step 5. Analyze ::..
***************************************************
making plots
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
..:: dRep dereplicate finished ::..
Dereplicated genomes................. /home/rrohwer/test_drep_reclustering/drep_ouput/dereplicated_genomes/
Dereplicated genomes information..... /home/rrohwer/test_drep_reclustering/drep_ouput/data_tables/Widb.csv
Figures.............................. /home/rrohwer/test_drep_reclustering/drep_ouput/figures/
Warnings............................. /home/rrohwer/test_drep_reclustering/drep_ouput/log/warnings.txt
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
Next I tried just deleting Bdb, but that also didn't work:
rm drep_output/data_tables/Bdb.csv
dRep dereplicate drep_ouput -p 50 -g paths.txt -comp 50 -con 10 --genomeInfo checkM.csv --S_algorithm fastANI --P_ani 0.8 --S_ani 0.96 --clusterAlg average --multiround_primary_clustering --skip_plots
But again it looks like it completed the whole process:
# terminal output after deleting Bdb.csv
***************************************************
..:: dRep dereplicate Step 1. Filter ::..
***************************************************
NOTE: Wdb already exists! This will not be filtered! Be sure you know what you're doing
Will filter the genome list
Loading genomes from a list
125 genomes were input to dRep
Calculating genome info of genomes
100.00% of genomes passed length filtering
100.00% of genomes passed checkM filtering
***************************************************
..:: dRep dereplicate Step 2. Cluster ::..
***************************************************
Running primary clustering
Running pair-wise MASH clustering
7 primary clusters made
Running secondary clustering
Running 2725 fastANI comparisons- should take ~ 3.8 min
Step 4. Return output
***************************************************
..:: dRep dereplicate Step 3. Choose ::..
***************************************************
Loading work directory
***************************************************
..:: dRep dereplicate Step 4. Evaluate ::..
***************************************************
will produce Widb (winner information db)
Winner database saved to /home/rrohwer/test_drep_reclustering/drep_ouputdata_tables/Widb.csv
***************************************************
..:: dRep dereplicate Step 5. Analyze ::..
***************************************************
making plots
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
..:: dRep dereplicate finished ::..
Dereplicated genomes................. /home/rrohwer/test_drep_reclustering/drep_ouput/dereplicated_genomes/
Dereplicated genomes information..... /home/rrohwer/test_drep_reclustering/drep_ouput/data_tables/Widb.csv
Figures.............................. /home/rrohwer/test_drep_reclustering/drep_ouput/figures/
Warnings............................. /home/rrohwer/test_drep_reclustering/drep_ouput/log/warnings.txt
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
Hi @rrohwer - I'm sorry for giving you bad advice, and thank you for the detailed bug report! I really appreciate it.
I believe this issue is that this cacheing functionality (re-using comparisons when possible) is turned off in certain circumstances, like when running --multiround_primary_clustering. I am 90% sure that adding -d to both the initial and the second run will re-enable caching, if you're willing to try one more local test.
Otherwise / alternatively, if you're comfortable dabbling in python, I can show you where the raw data is stored that dRep uses to do the clustering. Going from "distance matrix" to "clusters" is pretty simple (~5 lines of python code), so if you're comfortable with that, it's easy enough to try out whatever clustering methods / thresholds you'd like.
Best, Matt
Hi Matt, Sorry I lost track of this for a little bit! But, I tried again today... and it's still not working :(
I ran it a bunch of ways on my little test dataset, but it always re-did primary clustering as well as the secondary ANI calculations. Here's a summary, I can send you the output for any of these but they all took the same amount of time (2 min) and report that they are running mash and fastANI in the output. I have just labelled them a-i to keep them straight:
a-test: Just ran my original command:
with --multiround_primary_clustering
b-test: Re-ran with single linkage (deleted data_tables folder)
without --multiround_primary_clustering
c-test: Ran original command, but in debug mode
with --multiround_primary_clustering
with --debug
d-test: Re-ran with single linkage (deleted data_tables folder)
without --multiround_primary_clustering
with --debug
e-test: Ran original command, but in debug mode (same as c-test)
with --multiround_primary_clustering
with --debug
f-test: Re-ran with single linkage (deleted data_tables folder)
with --multiround_primary_clustering
with --debug
g-test: New original command
without --multiround_primary_clustering
with --debug
h-test: Re-ran with single linkage (deleted data_tables folder)
without --multiround_primary_clustering
with --debug
i-test: Re-ran with average linkage (deleted data_tables folder)
without --multiround_primary_clustering
with --debug
without specifying --P_ani or --S_algorithm
And this is the general command that my notes above describe adjustments of:
c-test:
dRep dereplicate
drep_output
-p 15
-g paths.txt
-comp 50
-con 10
--genomeInfo checkm.csv
--S_algorithm fastANI
--P_ani 0.8
--S_ani 0.96
--clusterAlg average
--multiround_primary_clustering
--skip_plots
--debug
> termout_c-test.txt 2>&1
ie:
$ date ; dRep dereplicate drep_output -p 15 -g midgard_paths_table_for_test_drep.txt -comp 50 -con 10 --genomeInfo midgard_checkM_table_for_test_drep.csv --S_algorithm fastANI --P_ani 0.8 --S_ani 0.96 --clusterAlg average --multiround_primary_clustering --skip_plots --debug > termout_c-test.txt 2>&1 ; date
I am more than happy to try with adjusted commands (it's all set up and only takes a minute), but I am also happy to mess with python.
But in that case, I'd love some pointers on which intermediate files are used, and which of your python scripts are running the clustering part.
Thanks so much again! Robin
Hi Robin,
Wow- thanks for running all these tests! Do you happen to have the log files from these tests that you would be willing to share? That would be helpful in seeing what's going on.
Thanks, Matt
Hi Matt,
Here is the final logger.log file ("i-test", so this was run after deleting the data_tables folder from the previous run). I didn't save all of them, just checked for errors and to see the call to mash happened. But I can run whichever command might be most insightful again no problem! Thanks for all your help!
(base) rrohwer@midgard:~/test_drep_reclustering$ cat drep_output/log/logger.log
03-28 18:50 DEBUG !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
03-28 18:50 DEBUG ***Logger started up at /home/rrohwer/test_drep_reclustering/drep_output/log/logger.log***
03-28 18:50 DEBUG Command to run dRep was: /home/rrohwer/miniconda3/envs/drep/bin/dRep dereplicate drep_output -p 15 -g midgard_paths_table_for_test_drep.txt -comp 50 -con 10 --genomeInfo midgard_checkM_table_for_test_drep.csv --S_algorithm fastANI --P_ani 0.8 --S_ani 0.96 --clusterAlg average --skip_plots --debug
03-28 18:50 DEBUG dRep version 3.5.0 was run
03-28 18:50 DEBUG !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
03-28 18:50 DEBUG Namespace(MASH_sketch=1000, N50_weight=0.5, P_ani=0.8, S_algorithm='fastANI', S_ani=0.96, SkipMash=False, SkipSecondary=False, centrality_weight=1, checkM_method='lineage_wf', checkm_group_size=2000, clusterAlg='average', completeness=50.0, completeness_weight=1, contamination=10.0, contamination_weight=5, cov_thresh=0.1, coverage_method='larger', debug=True, extra_weight_table=None, gen_warnings=False, genomeInfo='midgard_checkM_table_for_test_drep.csv', genomes=['midgard_paths_table_for_test_drep.txt'], greedy_secondary_clustering=False, ignoreGenomeQuality=False, length=50000, multiround_primary_clustering=False, n_PRESET='normal', operation='dereplicate', primary_chunksize=5000, processors=15, run_tertiary_clustering=False, set_recursion='0', size_weight=0, skani_extra='', skip_plots=True, strain_heterogeneity_weight=1, warn_aln=0.25, warn_dist=0.25, warn_sim=0.98, work_directory='drep_output')
03-28 18:50 DEBUG Starting the dereplicate operation
03-28 18:50 INFO ***************************************************
..:: dRep dereplicate Step 1. Filter ::..
***************************************************
03-28 18:50 DEBUG Loading work directory in filter
03-28 18:50 DEBUG Located: /home/rrohwer/test_drep_reclustering/drep_output
Datatables: []
Cluster files: []
Arguments: []
03-28 18:50 DEBUG Validating filter arguments
03-28 18:50 INFO Will filter the genome list
03-28 18:50 INFO Loading genomes from a list
03-28 18:50 INFO 125 genomes were input to dRep
03-28 18:50 INFO Calculating genome info of genomes
03-28 18:50 DEBUG Filtering genomes by size
03-28 18:50 INFO 100.00% of genomes passed length filtering
03-28 18:50 DEBUG Loading provided genome quality information
03-28 18:50 DEBUG HERE IS GENOME INFO:
03-28 18:50 DEBUG
genome completeness contamination
0 ME2000-03-30D8pf_3300042899_group1_bin108.fna 58.21 0.59
1 ME2000-05-11D8pf_3300042483_group1_bin124.fna 55.63 2.87
2 ME2000-05-11D8pf_3300042483_group1_bin171.fna 58.77 0.37
3 ME2000-05-11D8pf_3300042483_group1_bin177.fna 50.54 2.80
4 ME2000-05-25pf_3300042154_group1_bin40.fna 53.18 0.14
03-28 18:50 DEBUG There are the columns: ['genome', 'completeness', 'contamination']
03-28 18:50 DEBUG Filtering genomes
03-28 18:50 INFO 100.00% of genomes passed checkM filtering
03-28 18:50 DEBUG Storing resulting files
03-28 18:50 INFO ***************************************************
..:: dRep dereplicate Step 2. Cluster ::..
***************************************************
03-28 18:50 INFO Running primary clustering
03-28 18:50 INFO Running pair-wise MASH clustering
03-28 18:50 DEBUG Clustering MASH database
03-28 18:50 DEBUG Debug mode on - saving Mdb ASAP
03-28 18:50 DEBUG Debug mode on - saving CdbF ASAP
03-28 18:50 DEBUG Saving primary_linkage pickle to /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/
03-28 18:50 INFO 7 primary clusters made
03-28 18:50 INFO Running secondary clustering
03-28 18:50 INFO Running 2725 fastANI comparisons- should take ~ 12.5 min
03-28 18:50 DEBUG running cluster 2
03-28 18:50 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_svurjfadur --matrix -t 15 --minFraction 0 svurjfadur
03-28 18:51 DEBUG running cluster 3
03-28 18:51 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_kaolqkoyvx --matrix -t 15 --minFraction 0 kaolqkoyvx
03-28 18:51 DEBUG running cluster 4
03-28 18:51 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_jzfolxowfz --matrix -t 15 --minFraction 0 jzfolxowfz
03-28 18:51 DEBUG running cluster 5
03-28 18:51 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_amgvubemxr --matrix -t 15 --minFraction 0 amgvubemxr
03-28 18:52 DEBUG running cluster 7
03-28 18:52 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_zihrzafneb --matrix -t 15 --minFraction 0 zihrzafneb
03-28 18:52 DEBUG running cluster 1
03-28 18:52 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_xnxxnaejul --matrix -t 15 --minFraction 0 xnxxnaejul
03-28 18:52 DEBUG running cluster 6
03-28 18:52 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_kduipkyuba --matrix -t 15 --minFraction 0 kduipkyuba
03-28 18:52 DEBUG Clustering ANIn database
03-28 18:52 DEBUG making dictionary for average_ani
03-28 18:52 DEBUG list comprehension for average_ani
03-28 18:52 DEBUG averageing done
03-28 18:52 DEBUG Clustering ANIn database
03-28 18:52 DEBUG making dictionary for average_ani
03-28 18:52 DEBUG list comprehension for average_ani
03-28 18:52 DEBUG averageing done
03-28 18:52 DEBUG Clustering ANIn database
03-28 18:52 DEBUG making dictionary for average_ani
03-28 18:52 DEBUG list comprehension for average_ani
03-28 18:52 DEBUG averageing done
03-28 18:52 DEBUG Clustering ANIn database
03-28 18:52 DEBUG making dictionary for average_ani
03-28 18:52 DEBUG list comprehension for average_ani
03-28 18:52 DEBUG averageing done
03-28 18:52 DEBUG Clustering ANIn database
03-28 18:52 DEBUG making dictionary for average_ani
03-28 18:52 DEBUG list comprehension for average_ani
03-28 18:52 DEBUG averageing done
03-28 18:52 DEBUG Clustering ANIn database
03-28 18:52 DEBUG making dictionary for average_ani
03-28 18:52 DEBUG list comprehension for average_ani
03-28 18:52 DEBUG averageing done
03-28 18:52 DEBUG Clustering ANIn database
03-28 18:52 DEBUG making dictionary for average_ani
03-28 18:52 DEBUG list comprehension for average_ani
03-28 18:52 DEBUG averageing done
03-28 18:52 DEBUG Debug mode on - saving Ndb ASAP
03-28 18:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_1.pickle
03-28 18:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_2.pickle
03-28 18:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_3.pickle
03-28 18:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_4.pickle
03-28 18:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_5.pickle
03-28 18:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_6.pickle
03-28 18:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_7.pickle
03-28 18:52 INFO Step 4. Return output
03-28 18:52 DEBUG Main program run complete- saving output
03-28 18:52 INFO ***************************************************
..:: dRep dereplicate Step 3. Choose ::..
***************************************************
03-28 18:52 INFO Loading work directory
03-28 18:52 DEBUG Located: /home/rrohwer/test_drep_reclustering/drep_output
Datatables: ['Cdb', 'Ndb', 'CdbF', 'Bdb', 'Mdb', 'genomeInfo']
Cluster files: ['secondary_linkage_cluster_1', 'secondary_linkage_cluster_4', 'primary_linkage', 'secondary_linkage_cluster_5', 'secondary_linkage_cluster_6', 'secondary_linkage_cluster_2', 'secondary_linkage_cluster_7', 'secondary_linkage_cluster_3']
Arguments: ['cluster']
03-28 18:52 DEBUG Loading provided genome quality information
03-28 18:52 DEBUG HERE IS GENOME INFO:
03-28 18:52 DEBUG
genome completeness contamination
0 ME2000-03-30D8pf_3300042899_group1_bin108.fna 58.21 0.59
1 ME2000-05-11D8pf_3300042483_group1_bin124.fna 55.63 2.87
2 ME2000-05-11D8pf_3300042483_group1_bin171.fna 58.77 0.37
3 ME2000-05-11D8pf_3300042483_group1_bin177.fna 50.54 2.80
4 ME2000-05-25pf_3300042154_group1_bin40.fna 53.18 0.14
03-28 18:52 DEBUG There are the columns: ['genome', 'completeness', 'contamination']
03-28 18:52 DEBUG Sdb finished
03-28 18:52 DEBUG Wdb finished
03-28 18:52 DEBUG saving dereplicated genomes
03-28 18:52 INFO ***************************************************
..:: dRep dereplicate Step 4. Evaluate ::..
***************************************************
03-28 18:52 DEBUG Loading work directory
03-28 18:52 DEBUG Located: /home/rrohwer/test_drep_reclustering/drep_output
Datatables: ['Cdb', 'Ndb', 'Wdb', 'Sdb', 'CdbF', 'genomeInformation', 'Bdb', 'Mdb', 'genomeInfo']
Cluster files: ['secondary_linkage_cluster_1', 'secondary_linkage_cluster_4', 'primary_linkage', 'secondary_linkage_cluster_5', 'secondary_linkage_cluster_6', 'secondary_linkage_cluster_2', 'secondary_linkage_cluster_7', 'secondary_linkage_cluster_3']
Arguments: ['cluster']
03-28 18:52 DEBUG evaluating ['3']
03-28 18:52 INFO will produce Widb (winner information db)
03-28 18:52 INFO Winner database saved to /home/rrohwer/test_drep_reclustering/drep_outputdata_tables/Widb.csv
03-28 18:52 INFO ***************************************************
..:: dRep dereplicate Step 5. Analyze ::..
***************************************************
03-28 18:52 INFO making plots
03-28 18:52 INFO
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
..:: dRep dereplicate finished ::..
Dereplicated genomes................. /home/rrohwer/test_drep_reclustering/drep_output/dereplicated_genomes/
Dereplicated genomes information..... /home/rrohwer/test_drep_reclustering/drep_output/data_tables/Widb.csv
Figures.............................. /home/rrohwer/test_drep_reclustering/drep_output/figures/
Warnings............................. /home/rrohwer/test_drep_reclustering/drep_output/log/warnings.txt
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
03-28 18:52 DEBUG Finished the dereplicate operation!
03-28 18:58 DEBUG !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
03-28 18:58 DEBUG ***Logger started up at /home/rrohwer/test_drep_reclustering/drep_output/log/logger.log***
03-28 18:58 DEBUG Command to run dRep was: /home/rrohwer/miniconda3/envs/drep/bin/dRep dereplicate drep_output -p 15 -g midgard_paths_table_for_test_drep.txt -comp 50 -con 10 --genomeInfo midgard_checkM_table_for_test_drep.csv --S_algorithm fastANI --P_ani 0.8 --S_ani 0.96 --clusterAlg single --skip_plots --debug
03-28 18:58 DEBUG dRep version 3.5.0 was run
03-28 18:58 DEBUG !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
03-28 18:58 DEBUG Namespace(MASH_sketch=1000, N50_weight=0.5, P_ani=0.8, S_algorithm='fastANI', S_ani=0.96, SkipMash=False, SkipSecondary=False, centrality_weight=1, checkM_method='lineage_wf', checkm_group_size=2000, clusterAlg='single', completeness=50.0, completeness_weight=1, contamination=10.0, contamination_weight=5, cov_thresh=0.1, coverage_method='larger', debug=True, extra_weight_table=None, gen_warnings=False, genomeInfo='midgard_checkM_table_for_test_drep.csv', genomes=['midgard_paths_table_for_test_drep.txt'], greedy_secondary_clustering=False, ignoreGenomeQuality=False, length=50000, multiround_primary_clustering=False, n_PRESET='normal', operation='dereplicate', primary_chunksize=5000, processors=15, run_tertiary_clustering=False, set_recursion='0', size_weight=0, skani_extra='', skip_plots=True, strain_heterogeneity_weight=1, warn_aln=0.25, warn_dist=0.25, warn_sim=0.98, work_directory='drep_output')
03-28 18:58 DEBUG Starting the dereplicate operation
03-28 18:58 INFO ***************************************************
..:: dRep dereplicate Step 1. Filter ::..
***************************************************
03-28 18:58 DEBUG Loading work directory in filter
03-28 18:58 DEBUG Located: /home/rrohwer/test_drep_reclustering/drep_output
Datatables: []
Cluster files: ['secondary_linkage_cluster_1', 'secondary_linkage_cluster_4', 'primary_linkage', 'secondary_linkage_cluster_5', 'secondary_linkage_cluster_6', 'secondary_linkage_cluster_2', 'secondary_linkage_cluster_7', 'secondary_linkage_cluster_3']
Arguments: ['cluster']
03-28 18:58 DEBUG Validating filter arguments
03-28 18:58 INFO Will filter the genome list
03-28 18:58 INFO Loading genomes from a list
03-28 18:58 INFO 125 genomes were input to dRep
03-28 18:58 INFO Calculating genome info of genomes
03-28 18:58 DEBUG Filtering genomes by size
03-28 18:58 INFO 100.00% of genomes passed length filtering
03-28 18:58 DEBUG Loading provided genome quality information
03-28 18:58 DEBUG HERE IS GENOME INFO:
03-28 18:58 DEBUG
genome completeness contamination
0 ME2000-03-30D8pf_3300042899_group1_bin108.fna 58.21 0.59
1 ME2000-05-11D8pf_3300042483_group1_bin124.fna 55.63 2.87
2 ME2000-05-11D8pf_3300042483_group1_bin171.fna 58.77 0.37
3 ME2000-05-11D8pf_3300042483_group1_bin177.fna 50.54 2.80
4 ME2000-05-25pf_3300042154_group1_bin40.fna 53.18 0.14
03-28 18:58 DEBUG There are the columns: ['genome', 'completeness', 'contamination']
03-28 18:58 DEBUG Filtering genomes
03-28 18:58 INFO 100.00% of genomes passed checkM filtering
03-28 18:58 DEBUG Storing resulting files
03-28 18:58 INFO ***************************************************
..:: dRep dereplicate Step 2. Cluster ::..
***************************************************
03-28 18:58 INFO Running primary clustering
03-28 18:58 INFO Running pair-wise MASH clustering
03-28 18:58 DEBUG Clustering MASH database
03-28 18:58 DEBUG Debug mode on - saving Mdb ASAP
03-28 18:58 DEBUG Debug mode on - saving CdbF ASAP
03-28 18:58 DEBUG Saving primary_linkage pickle to /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/
03-28 18:58 INFO 5 primary clusters made
03-28 18:58 INFO Running secondary clustering
03-28 18:58 INFO Running 3125 fastANI comparisons- should take ~ 12.6 min
03-28 18:58 DEBUG running cluster 5
03-28 18:58 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_sjsixamyqr --matrix -t 15 --minFraction 0 sjsixamyqr
03-28 18:59 DEBUG running cluster 4
03-28 18:59 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_rgcuqtgojq --matrix -t 15 --minFraction 0 rgcuqtgojq
03-28 18:59 DEBUG running cluster 2
03-28 18:59 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_tjdycdofsh --matrix -t 15 --minFraction 0 tjdycdofsh
03-28 18:59 DEBUG running cluster 3
03-28 18:59 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_jiezgnhrld --matrix -t 15 --minFraction 0 jiezgnhrld
03-28 19:00 DEBUG running cluster 1
03-28 19:00 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_hhafcwpfbe --matrix -t 15 --minFraction 0 hhafcwpfbe
03-28 19:00 DEBUG Clustering ANIn database
03-28 19:00 DEBUG making dictionary for average_ani
03-28 19:00 DEBUG list comprehension for average_ani
03-28 19:00 DEBUG averageing done
03-28 19:00 DEBUG Clustering ANIn database
03-28 19:00 DEBUG making dictionary for average_ani
03-28 19:00 DEBUG list comprehension for average_ani
03-28 19:00 DEBUG averageing done
03-28 19:00 DEBUG Clustering ANIn database
03-28 19:00 DEBUG making dictionary for average_ani
03-28 19:00 DEBUG list comprehension for average_ani
03-28 19:00 DEBUG averageing done
03-28 19:00 DEBUG Clustering ANIn database
03-28 19:00 DEBUG making dictionary for average_ani
03-28 19:00 DEBUG list comprehension for average_ani
03-28 19:00 DEBUG averageing done
03-28 19:00 DEBUG Clustering ANIn database
03-28 19:00 DEBUG making dictionary for average_ani
03-28 19:00 DEBUG list comprehension for average_ani
03-28 19:00 DEBUG averageing done
03-28 19:00 DEBUG Debug mode on - saving Ndb ASAP
03-28 19:00 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_1.pickle
03-28 19:00 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_2.pickle
03-28 19:00 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_3.pickle
03-28 19:00 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_4.pickle
03-28 19:00 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_5.pickle
03-28 19:00 INFO Step 4. Return output
03-28 19:00 DEBUG Main program run complete- saving output
03-28 19:00 INFO ***************************************************
..:: dRep dereplicate Step 3. Choose ::..
***************************************************
03-28 19:00 INFO Loading work directory
03-28 19:00 DEBUG Located: /home/rrohwer/test_drep_reclustering/drep_output
Datatables: ['Cdb', 'Ndb', 'CdbF', 'Bdb', 'Mdb', 'genomeInfo']
Cluster files: ['secondary_linkage_cluster_1', 'secondary_linkage_cluster_4', 'primary_linkage', 'secondary_linkage_cluster_5', 'secondary_linkage_cluster_2', 'secondary_linkage_cluster_3']
Arguments: ['cluster']
03-28 19:00 DEBUG Loading provided genome quality information
03-28 19:00 DEBUG HERE IS GENOME INFO:
03-28 19:00 DEBUG
genome completeness contamination
0 ME2000-03-30D8pf_3300042899_group1_bin108.fna 58.21 0.59
1 ME2000-05-11D8pf_3300042483_group1_bin124.fna 55.63 2.87
2 ME2000-05-11D8pf_3300042483_group1_bin171.fna 58.77 0.37
3 ME2000-05-11D8pf_3300042483_group1_bin177.fna 50.54 2.80
4 ME2000-05-25pf_3300042154_group1_bin40.fna 53.18 0.14
03-28 19:00 DEBUG There are the columns: ['genome', 'completeness', 'contamination']
03-28 19:00 DEBUG Sdb finished
03-28 19:00 DEBUG Wdb finished
03-28 19:00 DEBUG saving dereplicated genomes
03-28 19:00 INFO ***************************************************
..:: dRep dereplicate Step 4. Evaluate ::..
***************************************************
03-28 19:00 DEBUG Loading work directory
03-28 19:00 DEBUG Located: /home/rrohwer/test_drep_reclustering/drep_output
Datatables: ['Cdb', 'Ndb', 'Wdb', 'Sdb', 'CdbF', 'genomeInformation', 'Bdb', 'Mdb', 'genomeInfo']
Cluster files: ['secondary_linkage_cluster_1', 'secondary_linkage_cluster_4', 'primary_linkage', 'secondary_linkage_cluster_5', 'secondary_linkage_cluster_2', 'secondary_linkage_cluster_3']
Arguments: ['cluster']
03-28 19:00 DEBUG evaluating ['3']
03-28 19:00 INFO will produce Widb (winner information db)
03-28 19:00 INFO Winner database saved to /home/rrohwer/test_drep_reclustering/drep_outputdata_tables/Widb.csv
03-28 19:00 INFO ***************************************************
..:: dRep dereplicate Step 5. Analyze ::..
***************************************************
03-28 19:00 INFO making plots
03-28 19:00 INFO
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
..:: dRep dereplicate finished ::..
Dereplicated genomes................. /home/rrohwer/test_drep_reclustering/drep_output/dereplicated_genomes/
Dereplicated genomes information..... /home/rrohwer/test_drep_reclustering/drep_output/data_tables/Widb.csv
Figures.............................. /home/rrohwer/test_drep_reclustering/drep_output/figures/
Warnings............................. /home/rrohwer/test_drep_reclustering/drep_output/log/warnings.txt
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
03-28 19:00 DEBUG Finished the dereplicate operation!
03-28 19:50 DEBUG !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
03-28 19:50 DEBUG ***Logger started up at /home/rrohwer/test_drep_reclustering/drep_output/log/logger.log***
03-28 19:50 DEBUG Command to run dRep was: /home/rrohwer/miniconda3/envs/drep/bin/dRep dereplicate drep_output -p 15 -g midgard_paths_table_for_test_drep.txt -comp 50 -con 10 --genomeInfo midgard_checkM_table_for_test_drep.csv --S_ani 0.96 --clusterAlg average --skip_plots --debug
03-28 19:50 DEBUG dRep version 3.5.0 was run
03-28 19:50 DEBUG !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
03-28 19:50 DEBUG Namespace(MASH_sketch=1000, N50_weight=0.5, P_ani=0.9, S_algorithm='fastANI', S_ani=0.96, SkipMash=False, SkipSecondary=False, centrality_weight=1, checkM_method='lineage_wf', checkm_group_size=2000, clusterAlg='average', completeness=50.0, completeness_weight=1, contamination=10.0, contamination_weight=5, cov_thresh=0.1, coverage_method='larger', debug=True, extra_weight_table=None, gen_warnings=False, genomeInfo='midgard_checkM_table_for_test_drep.csv', genomes=['midgard_paths_table_for_test_drep.txt'], greedy_secondary_clustering=False, ignoreGenomeQuality=False, length=50000, multiround_primary_clustering=False, n_PRESET='normal', operation='dereplicate', primary_chunksize=5000, processors=15, run_tertiary_clustering=False, set_recursion='0', size_weight=0, skani_extra='', skip_plots=True, strain_heterogeneity_weight=1, warn_aln=0.25, warn_dist=0.25, warn_sim=0.98, work_directory='drep_output')
03-28 19:50 DEBUG Starting the dereplicate operation
03-28 19:50 INFO ***************************************************
..:: dRep dereplicate Step 1. Filter ::..
***************************************************
03-28 19:50 DEBUG Loading work directory in filter
03-28 19:50 DEBUG Located: /home/rrohwer/test_drep_reclustering/drep_output
Datatables: []
Cluster files: ['secondary_linkage_cluster_1', 'secondary_linkage_cluster_4', 'primary_linkage', 'secondary_linkage_cluster_5', 'secondary_linkage_cluster_2', 'secondary_linkage_cluster_3']
Arguments: ['cluster']
03-28 19:50 DEBUG Validating filter arguments
03-28 19:50 INFO Will filter the genome list
03-28 19:50 INFO Loading genomes from a list
03-28 19:50 INFO 125 genomes were input to dRep
03-28 19:50 INFO Calculating genome info of genomes
03-28 19:50 DEBUG Filtering genomes by size
03-28 19:50 INFO 100.00% of genomes passed length filtering
03-28 19:50 DEBUG Loading provided genome quality information
03-28 19:50 DEBUG HERE IS GENOME INFO:
03-28 19:50 DEBUG
genome completeness contamination
0 ME2000-03-30D8pf_3300042899_group1_bin108.fna 58.21 0.59
1 ME2000-05-11D8pf_3300042483_group1_bin124.fna 55.63 2.87
2 ME2000-05-11D8pf_3300042483_group1_bin171.fna 58.77 0.37
3 ME2000-05-11D8pf_3300042483_group1_bin177.fna 50.54 2.80
4 ME2000-05-25pf_3300042154_group1_bin40.fna 53.18 0.14
03-28 19:50 DEBUG There are the columns: ['genome', 'completeness', 'contamination']
03-28 19:50 DEBUG Filtering genomes
03-28 19:50 INFO 100.00% of genomes passed checkM filtering
03-28 19:50 DEBUG Storing resulting files
03-28 19:50 INFO ***************************************************
..:: dRep dereplicate Step 2. Cluster ::..
***************************************************
03-28 19:50 INFO Running primary clustering
03-28 19:50 INFO Running pair-wise MASH clustering
03-28 19:50 DEBUG Clustering MASH database
03-28 19:50 DEBUG Debug mode on - saving Mdb ASAP
03-28 19:50 DEBUG Debug mode on - saving CdbF ASAP
03-28 19:50 DEBUG Saving primary_linkage pickle to /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/
03-28 19:50 INFO 19 primary clusters made
03-28 19:50 INFO Running secondary clustering
03-28 19:50 INFO Running 975 fastANI comparisons- should take ~ 12.3 min
03-28 19:50 DEBUG running cluster 3
03-28 19:50 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_lebzgrbfdu --matrix -t 15 --minFraction 0 lebzgrbfdu
03-28 19:50 DEBUG running cluster 8
03-28 19:50 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_lpxkxqltcn --matrix -t 15 --minFraction 0 lpxkxqltcn
03-28 19:51 DEBUG running cluster 10
03-28 19:51 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_oeufdssswl --matrix -t 15 --minFraction 0 oeufdssswl
03-28 19:51 DEBUG running cluster 2
03-28 19:51 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_ggqcsdviws --matrix -t 15 --minFraction 0 ggqcsdviws
03-28 19:51 DEBUG running cluster 12
03-28 19:51 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_fjnphporjv --matrix -t 15 --minFraction 0 fjnphporjv
03-28 19:51 DEBUG running cluster 19
03-28 19:51 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_rmlcctiril --matrix -t 15 --minFraction 0 rmlcctiril
03-28 19:51 DEBUG running cluster 16
03-28 19:51 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_dsudxayffb --matrix -t 15 --minFraction 0 dsudxayffb
03-28 19:51 DEBUG running cluster 7
03-28 19:51 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_usqrcvjubx --matrix -t 15 --minFraction 0 usqrcvjubx
03-28 19:51 DEBUG running cluster 1
03-28 19:51 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_zzskslkdbi --matrix -t 15 --minFraction 0 zzskslkdbi
03-28 19:51 DEBUG running cluster 9
03-28 19:51 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_jnjulbsegi --matrix -t 15 --minFraction 0 jnjulbsegi
03-28 19:51 DEBUG running cluster 13
03-28 19:51 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_sbnbdewtzf --matrix -t 15 --minFraction 0 sbnbdewtzf
03-28 19:51 DEBUG running cluster 6
03-28 19:51 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_mkuqyzpnda --matrix -t 15 --minFraction 0 mkuqyzpnda
03-28 19:51 DEBUG running cluster 4
03-28 19:51 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_vujrjczanp --matrix -t 15 --minFraction 0 vujrjczanp
03-28 19:51 DEBUG running cluster 14
03-28 19:51 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_zrxtlntvjd --matrix -t 15 --minFraction 0 zrxtlntvjd
03-28 19:52 DEBUG running cluster 5
03-28 19:52 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_dvnwqyggnq --matrix -t 15 --minFraction 0 dvnwqyggnq
03-28 19:52 DEBUG running cluster 18
03-28 19:52 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_gvkfjakxlh --matrix -t 15 --minFraction 0 gvkfjakxlh
03-28 19:52 DEBUG running cluster 11
03-28 19:52 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_eegiqbnbhz --matrix -t 15 --minFraction 0 eegiqbnbhz
03-28 19:52 DEBUG running cluster 15
03-28 19:52 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_udmakopbhg --matrix -t 15 --minFraction 0 udmakopbhg
03-28 19:52 DEBUG running cluster 17
03-28 19:52 DEBUG /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_etelwzsvin --matrix -t 15 --minFraction 0 etelwzsvin
03-28 19:52 DEBUG Clustering ANIn database
03-28 19:52 DEBUG making dictionary for average_ani
03-28 19:52 DEBUG list comprehension for average_ani
03-28 19:52 DEBUG averageing done
03-28 19:52 DEBUG Clustering ANIn database
03-28 19:52 DEBUG making dictionary for average_ani
03-28 19:52 DEBUG list comprehension for average_ani
03-28 19:52 DEBUG averageing done
03-28 19:52 DEBUG Clustering ANIn database
03-28 19:52 DEBUG making dictionary for average_ani
03-28 19:52 DEBUG list comprehension for average_ani
03-28 19:52 DEBUG averageing done
03-28 19:52 DEBUG Clustering ANIn database
03-28 19:52 DEBUG making dictionary for average_ani
03-28 19:52 DEBUG list comprehension for average_ani
03-28 19:52 DEBUG averageing done
03-28 19:52 DEBUG Clustering ANIn database
03-28 19:52 DEBUG making dictionary for average_ani
03-28 19:52 DEBUG list comprehension for average_ani
03-28 19:52 DEBUG averageing done
03-28 19:52 DEBUG Clustering ANIn database
03-28 19:52 DEBUG making dictionary for average_ani
03-28 19:52 DEBUG list comprehension for average_ani
03-28 19:52 DEBUG averageing done
03-28 19:52 DEBUG Clustering ANIn database
03-28 19:52 DEBUG making dictionary for average_ani
03-28 19:52 DEBUG list comprehension for average_ani
03-28 19:52 DEBUG averageing done
03-28 19:52 DEBUG Clustering ANIn database
03-28 19:52 DEBUG making dictionary for average_ani
03-28 19:52 DEBUG list comprehension for average_ani
03-28 19:52 DEBUG averageing done
03-28 19:52 DEBUG Clustering ANIn database
03-28 19:52 DEBUG making dictionary for average_ani
03-28 19:52 DEBUG list comprehension for average_ani
03-28 19:52 DEBUG averageing done
03-28 19:52 DEBUG Clustering ANIn database
03-28 19:52 DEBUG making dictionary for average_ani
03-28 19:52 DEBUG list comprehension for average_ani
03-28 19:52 DEBUG averageing done
03-28 19:52 DEBUG Clustering ANIn database
03-28 19:52 DEBUG making dictionary for average_ani
03-28 19:52 DEBUG list comprehension for average_ani
03-28 19:52 DEBUG averageing done
03-28 19:52 DEBUG Clustering ANIn database
03-28 19:52 DEBUG making dictionary for average_ani
03-28 19:52 DEBUG list comprehension for average_ani
03-28 19:52 DEBUG averageing done
03-28 19:52 DEBUG Clustering ANIn database
03-28 19:52 DEBUG making dictionary for average_ani
03-28 19:52 DEBUG list comprehension for average_ani
03-28 19:52 DEBUG averageing done
03-28 19:52 DEBUG Clustering ANIn database
03-28 19:52 DEBUG making dictionary for average_ani
03-28 19:52 DEBUG list comprehension for average_ani
03-28 19:52 DEBUG averageing done
03-28 19:52 DEBUG Clustering ANIn database
03-28 19:52 DEBUG making dictionary for average_ani
03-28 19:52 DEBUG list comprehension for average_ani
03-28 19:52 DEBUG averageing done
03-28 19:52 DEBUG Clustering ANIn database
03-28 19:52 DEBUG making dictionary for average_ani
03-28 19:52 DEBUG list comprehension for average_ani
03-28 19:52 DEBUG averageing done
03-28 19:52 DEBUG Clustering ANIn database
03-28 19:52 DEBUG making dictionary for average_ani
03-28 19:52 DEBUG list comprehension for average_ani
03-28 19:52 DEBUG averageing done
03-28 19:52 DEBUG Clustering ANIn database
03-28 19:52 DEBUG making dictionary for average_ani
03-28 19:52 DEBUG list comprehension for average_ani
03-28 19:52 DEBUG averageing done
03-28 19:52 DEBUG Clustering ANIn database
03-28 19:52 DEBUG making dictionary for average_ani
03-28 19:52 DEBUG list comprehension for average_ani
03-28 19:52 DEBUG averageing done
03-28 19:52 DEBUG Debug mode on - saving Ndb ASAP
03-28 19:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_1.pickle
03-28 19:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_2.pickle
03-28 19:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_3.pickle
03-28 19:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_4.pickle
03-28 19:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_5.pickle
03-28 19:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_6.pickle
03-28 19:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_7.pickle
03-28 19:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_8.pickle
03-28 19:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_9.pickle
03-28 19:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_10.pickle
03-28 19:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_11.pickle
03-28 19:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_12.pickle
03-28 19:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_13.pickle
03-28 19:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_14.pickle
03-28 19:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_15.pickle
03-28 19:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_16.pickle
03-28 19:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_17.pickle
03-28 19:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_18.pickle
03-28 19:52 DEBUG Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_19.pickle
03-28 19:52 INFO Step 4. Return output
03-28 19:52 DEBUG Main program run complete- saving output
03-28 19:52 INFO ***************************************************
..:: dRep dereplicate Step 3. Choose ::..
***************************************************
03-28 19:52 INFO Loading work directory
03-28 19:52 DEBUG Located: /home/rrohwer/test_drep_reclustering/drep_output
Datatables: ['Cdb', 'Ndb', 'CdbF', 'Bdb', 'Mdb', 'genomeInfo']
Cluster files: ['secondary_linkage_cluster_16', 'secondary_linkage_cluster_1', 'secondary_linkage_cluster_4', 'secondary_linkage_cluster_13', 'primary_linkage', 'secondary_linkage_cluster_8', 'secondary_linkage_cluster_5', 'secondary_linkage_cluster_6', 'secondary_linkage_cluster_2', 'secondary_linkage_cluster_14', 'secondary_linkage_cluster_17', 'secondary_linkage_cluster_10', 'secondary_linkage_cluster_12', 'secondary_linkage_cluster_7', 'secondary_linkage_cluster_15', 'secondary_linkage_cluster_11', 'secondary_linkage_cluster_18', 'secondary_linkage_cluster_9', 'secondary_linkage_cluster_19', 'secondary_linkage_cluster_3']
Arguments: ['cluster']
03-28 19:52 DEBUG Loading provided genome quality information
03-28 19:52 DEBUG HERE IS GENOME INFO:
03-28 19:52 DEBUG
genome completeness contamination
0 ME2000-03-30D8pf_3300042899_group1_bin108.fna 58.21 0.59
1 ME2000-05-11D8pf_3300042483_group1_bin124.fna 55.63 2.87
2 ME2000-05-11D8pf_3300042483_group1_bin171.fna 58.77 0.37
3 ME2000-05-11D8pf_3300042483_group1_bin177.fna 50.54 2.80
4 ME2000-05-25pf_3300042154_group1_bin40.fna 53.18 0.14
03-28 19:52 DEBUG There are the columns: ['genome', 'completeness', 'contamination']
03-28 19:52 DEBUG Sdb finished
03-28 19:52 DEBUG Wdb finished
03-28 19:52 DEBUG saving dereplicated genomes
03-28 19:52 INFO ***************************************************
..:: dRep dereplicate Step 4. Evaluate ::..
***************************************************
03-28 19:52 DEBUG Loading work directory
03-28 19:52 DEBUG Located: /home/rrohwer/test_drep_reclustering/drep_output
Datatables: ['Cdb', 'Ndb', 'Wdb', 'Sdb', 'CdbF', 'genomeInformation', 'Bdb', 'Mdb', 'genomeInfo']
Cluster files: ['secondary_linkage_cluster_16', 'secondary_linkage_cluster_1', 'secondary_linkage_cluster_4', 'secondary_linkage_cluster_13', 'primary_linkage', 'secondary_linkage_cluster_8', 'secondary_linkage_cluster_5', 'secondary_linkage_cluster_6', 'secondary_linkage_cluster_2', 'secondary_linkage_cluster_14', 'secondary_linkage_cluster_17', 'secondary_linkage_cluster_10', 'secondary_linkage_cluster_12', 'secondary_linkage_cluster_7', 'secondary_linkage_cluster_15', 'secondary_linkage_cluster_11', 'secondary_linkage_cluster_18', 'secondary_linkage_cluster_9', 'secondary_linkage_cluster_19', 'secondary_linkage_cluster_3']
Arguments: ['cluster']
03-28 19:52 DEBUG evaluating ['3']
03-28 19:52 INFO will produce Widb (winner information db)
03-28 19:52 INFO Winner database saved to /home/rrohwer/test_drep_reclustering/drep_outputdata_tables/Widb.csv
03-28 19:52 INFO ***************************************************
..:: dRep dereplicate Step 5. Analyze ::..
***************************************************
03-28 19:52 INFO making plots
03-28 19:52 INFO
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
..:: dRep dereplicate finished ::..
Dereplicated genomes................. /home/rrohwer/test_drep_reclustering/drep_output/dereplicated_genomes/
Dereplicated genomes information..... /home/rrohwer/test_drep_reclustering/drep_output/data_tables/Widb.csv
Figures.............................. /home/rrohwer/test_drep_reclustering/drep_output/figures/
Warnings............................. /home/rrohwer/test_drep_reclustering/drep_output/log/warnings.txt
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
03-28 19:52 DEBUG Finished the dereplicate operation!
Oh wait- the logger file I pasted above has the last 3 tests concatenated, since only the data_tables file was removed between them. So actually that has g, h, i tests:
g - run without multiround clustering and with debug
h - re-run, but with single instead of average linkage
i - re-run, now with average instead of single linkage, and also try not specifying P_ani or S_algorithm since those shouldn't be re-done
Thanks! Robin
Hi Robin,
OK- I've looked into this more and I believe the issue is that fastANI doesn't support loading old results. Apologies for not remembering this earlier.
I took the liberty of writing up some example python code on how to do the clustering based on the output Ndb.csv file. You can adjust the clustering method and the threshold, and the resulting "Cdb" file returned will reflect those changes.
Let me know if you have any questions!
Best, Matt
# Import
import drep
import drep.WorkDirectory
import drep.d_cluster
import drep.d_cluster.cluster_utils
# State dRep folder location
drep_folder = '/pl/active/olm-data2/Projects/2024_Bifidobacteria_Database/bifido_genomes/All_Genomes/drep_genomes_95/'
# Load Ndb
wd = drep.WorkDirectory.WorkDirectory(drep_folder)
Ndb = wd.get_db('Ndb', return_none=False)
# Recluster
Cdb, c2ret = drep.d_cluster.utils._cluster_Ndb(Ndb, clusterAlg='average', S_ani=0.95)