drep icon indicating copy to clipboard operation
drep copied to clipboard

possible to re-do clustering without repeating ANI calcs?

Open rrohwer opened this issue 1 year ago • 11 comments

I see that you can split primary and secondary clustering into 2 steps, but I was wondering if I'm missing a way to just re-run the secondary clustering without re-doing all of the fastANI calculations. For example, if I want to compare average vs single linkage, is there a way to use the existing ANI calculations? I guess I had assumed that the ANI comparison is the slow step so re-doing hierarchical clustering with a different algorithm would be easy, but maybe this is not true. Thanks for your advice, and for supporting this great software.

rrohwer avatar Feb 19 '25 16:02 rrohwer

Hi @rrohwer - yes, if you just run dRep again with the same output directory, it will re-load all the comparisons that have been done and run much more quickly

MrOlm avatar Feb 19 '25 17:02 MrOlm

Thank you! perfect!

rrohwer avatar Feb 19 '25 20:02 rrohwer

Hi @MrOlm , sorry I closed this before I re-ran it and I still have a question! I thought it would just over-write files, but it seems like I need to move or delete some in order to have it recluster. I'm just not sure which ones it needs (so which to not delete). For example, can I just delete or rename data_tables folder, or for example is it using the ndb file for the ANI info?

Here in my error output file I see it stopped after encountering some existing files:

***************************************************
    ..:: dRep dereplicate Step 1. Filter ::..
***************************************************

NOTE: Wdb already exists! This will not be filtered! Be sure you know what you're doing
Both Bdb and a genome list are found- either don't include a genome list or start a new work directory!

And here in one of the data_tables files I can see that it was not overwritten (because clustering is still "average" instead of "single":

$ head -n 3 data_tables/Cdb.csv
genome,secondary_cluster,threshold,cluster_method,comparison_algorithm,primary_cluster
3300042997_100.fna,1_1,0.040000000000000036,average,fastANI,1
3300044715_55.fna,1_1,0.040000000000000036,average,fastANI,1

And just for reference here is how I ran it the first time, and how I re-ran it (so I literally only changed adding the --clusterAlg single flag. When I re-ran I copied out the old data_tables folder to save it elsewhere, but copied instead of moved so that I left all the first run's files there.

dRep dereplicate drep_all_80_96 -p 128 -g paths_table_for_drep_all_80_96.txt -comp 50 -con 10 --genomeInfo checkM_table_for_drep_all_80_96.csv --S_algorithm fastANI --P_ani 0.8 --S_ani 0.96 --multiround_primary_clustering --skip_plots

dRep dereplicate drep_all_80_96 -p 128 -g paths_table_for_drep_all_80_96.txt -comp 50 -con 10 --genomeInfo checkM_table_for_drep_all_80_96.csv --S_algorithm fastANI --P_ani 0.8 --S_ani 0.96 --clusterAlg single --multiround_primary_clustering --skip_plots

rrohwer avatar Feb 21 '25 00:02 rrohwer

Hi @rrohwer - you could either 1) delete / move all the files in the data_tables folder, 2) just delete Bdb.csv (that's the only one that is causing a problem), or 3) remove -g paths_table_for_drep_all_80_96.tx from your commend not not delete any of the files in the data_tables folder. All will work!

Best, Matt

MrOlm avatar Feb 21 '25 03:02 MrOlm

Hi Matt,

It's still not working for me. First I tried removing the -g argument, and I assume it was re-doing the full run because my job timed out on the server. So I made a smaller test-set of genomes to test on a local install.

However, I tried running it several ways and it seems like it is still re-doing both the mash clustering and fastANI pairwise ANI comparisons.

  • first I tried completely removing data_tables folder
  • then I tried leaving data_tables folder and removing the -g argument
  • then I tried deleting just data_tables/Bdb.csv (and still including -g)

For all of these, the program took the same amount of time (just 2 minutes on the test data, but still, it wasn't faster with only re-clustering). And in the terminal output and logger files it seems to list repeating both mash and fastANI calculations.

Thanks so much for your help! Below are details from each run.

Robin


First I simply removed the data_tables folder before re-running and I am pretty sure dRep is still re-running the whole process, including mash clustering and fastANI ANI comparisons.

Here are the commands I used:

# (there are 125 genomes here, expecting ~5 primary clust, ~ 5 secondary/primary clust, and ~ 5 genomes/secondary clust.)

# first I ran with average linkage specified
dRep dereplicate drep_ouput -p 50 -g paths.txt -comp 50 -con 10 --genomeInfo checkM.csv --S_algorithm fastANI --P_ani 0.8 --S_ani 0.96 --clusterAlg average --multiround_primary_clustering --skip_plots 

# then I removed the data_tables folder and re-ran with single linkage specified
mv drep_ouput/data_tables data_tables_80_96_average
dRep dereplicate drep_ouput -p 50 -g paths.txt -comp 50 -con 10 --genomeInfo checkM.csv --S_algorithm fastANI --P_ani 0.8 --S_ani 0.96 --clusterAlg single --multiround_primary_clustering --skip_plots 

I see no errors and I can see that the new round has single linkage clusters in it, and no warnings.txt file was generated. But,

  • Both times it took 2 minutes, so the second time was not shorter.
  • And, in both the terminal output and in the log files it looks like it re-ran all of the ANI comparisons and not just the clustering part:
# terminal output from single linkage run that shouldn't have repeated ANI calculations

***************************************************
    ..:: dRep dereplicate Step 1. Filter ::..
***************************************************

Will filter the genome list
Loading genomes from a list
125 genomes were input to dRep
Calculating genome info of genomes
100.00% of genomes passed length filtering
100.00% of genomes passed checkM filtering
***************************************************
    ..:: dRep dereplicate Step 2. Cluster ::..
***************************************************

Running primary clustering
Running pair-wise MASH clustering
5 primary clusters made
Running secondary clustering
Running 3125 fastANI comparisons- should take ~ 3.8 min
Step 4. Return output
***************************************************
    ..:: dRep dereplicate Step 3. Choose ::..
***************************************************

Loading work directory
***************************************************
    ..:: dRep dereplicate Step 4. Evaluate ::..
***************************************************

will produce Widb (winner information db)
Winner database saved to /home/rrohwer/test_drep_reclustering/drep_ouputdata_tables/Widb.csv
***************************************************
    ..:: dRep dereplicate Step 5. Analyze ::..
***************************************************

making plots

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

    ..:: dRep dereplicate finished ::..

Dereplicated genomes................. /home/rrohwer/test_drep_reclustering/drep_ouput/dereplicated_genomes/
Dereplicated genomes information..... /home/rrohwer/test_drep_reclustering/drep_ouput/data_tables/Widb.csv
Figures.............................. /home/rrohwer/test_drep_reclustering/drep_ouput/figures/
Warnings............................. /home/rrohwer/test_drep_reclustering/drep_ouput/log/warnings.txt

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

So next I also re-tried removing the -g argument:

# First I re-ran it fresh (with average linkage)
rm -rf drep_ouput
dRep dereplicate drep_ouput -p 50 -g paths.txt -comp 50 -con 10 --genomeInfo checkM.csv --S_algorithm fastANI --P_ani 0.8 --S_ani 0.96 --clusterAlg average --multiround_primary_clustering --skip_plots 

# Then I re-ran it with the data_tables folder left in place but without -g (with single linkage)
dRep dereplicate drep_ouput -p 50 -comp 50 -con 10 --genomeInfo checkM.csv --S_algorithm fastANI --P_ani 0.8 --S_ani 0.96 --clusterAlg single --multiround_primary_clustering --skip_plots 

But again although I see that it calculated single linkage, it also still took 2 minutes, and in the terminal output and the logfile it looks like it re-ran clustering:

# terminal output for re-running with single-linkage by removing the -g paths.txt argument

***************************************************
    ..:: dRep dereplicate Step 1. Filter ::..
***************************************************

NOTE: Wdb already exists! This will not be filtered! Be sure you know what you're doing
NOTE: Clustering already exists! This will not be filtered! Be sure you know what you're doing
Will filter Bdb
125 genomes were input to dRep
Calculating genome info of genomes
100.00% of genomes passed length filtering
100.00% of genomes passed checkM filtering
***************************************************
    ..:: dRep dereplicate Step 2. Cluster ::..
***************************************************

Running primary clustering
Running pair-wise MASH clustering
5 primary clusters made
Running secondary clustering
Running 3125 fastANI comparisons- should take ~ 3.8 min
Step 4. Return output
***************************************************
    ..:: dRep dereplicate Step 3. Choose ::..
***************************************************

Loading work directory
***************************************************
    ..:: dRep dereplicate Step 4. Evaluate ::..
***************************************************

will produce Widb (winner information db)
Winner database saved to /home/rrohwer/test_drep_reclustering/drep_ouputdata_tables/Widb.csv
***************************************************
    ..:: dRep dereplicate Step 5. Analyze ::..
***************************************************

making plots

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

    ..:: dRep dereplicate finished ::..

Dereplicated genomes................. /home/rrohwer/test_drep_reclustering/drep_ouput/dereplicated_genomes/
Dereplicated genomes information..... /home/rrohwer/test_drep_reclustering/drep_ouput/data_tables/Widb.csv
Figures.............................. /home/rrohwer/test_drep_reclustering/drep_ouput/figures/
Warnings............................. /home/rrohwer/test_drep_reclustering/drep_ouput/log/warnings.txt

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

Next I tried just deleting Bdb, but that also didn't work:

rm drep_output/data_tables/Bdb.csv
dRep dereplicate drep_ouput -p 50 -g paths.txt -comp 50 -con 10 --genomeInfo checkM.csv --S_algorithm fastANI --P_ani 0.8 --S_ani 0.96 --clusterAlg average --multiround_primary_clustering --skip_plots 

But again it looks like it completed the whole process:

# terminal output after deleting Bdb.csv

***************************************************
    ..:: dRep dereplicate Step 1. Filter ::..
***************************************************

NOTE: Wdb already exists! This will not be filtered! Be sure you know what you're doing
Will filter the genome list
Loading genomes from a list
125 genomes were input to dRep
Calculating genome info of genomes
100.00% of genomes passed length filtering
100.00% of genomes passed checkM filtering
***************************************************
    ..:: dRep dereplicate Step 2. Cluster ::..
***************************************************

Running primary clustering
Running pair-wise MASH clustering
7 primary clusters made
Running secondary clustering
Running 2725 fastANI comparisons- should take ~ 3.8 min
Step 4. Return output
***************************************************
    ..:: dRep dereplicate Step 3. Choose ::..
***************************************************

Loading work directory
***************************************************
    ..:: dRep dereplicate Step 4. Evaluate ::..
***************************************************

will produce Widb (winner information db)
Winner database saved to /home/rrohwer/test_drep_reclustering/drep_ouputdata_tables/Widb.csv
***************************************************
    ..:: dRep dereplicate Step 5. Analyze ::..
***************************************************

making plots

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

    ..:: dRep dereplicate finished ::..

Dereplicated genomes................. /home/rrohwer/test_drep_reclustering/drep_ouput/dereplicated_genomes/
Dereplicated genomes information..... /home/rrohwer/test_drep_reclustering/drep_ouput/data_tables/Widb.csv
Figures.............................. /home/rrohwer/test_drep_reclustering/drep_ouput/figures/
Warnings............................. /home/rrohwer/test_drep_reclustering/drep_ouput/log/warnings.txt

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

rrohwer avatar Feb 25 '25 01:02 rrohwer

Hi @rrohwer - I'm sorry for giving you bad advice, and thank you for the detailed bug report! I really appreciate it.

I believe this issue is that this cacheing functionality (re-using comparisons when possible) is turned off in certain circumstances, like when running --multiround_primary_clustering. I am 90% sure that adding -d to both the initial and the second run will re-enable caching, if you're willing to try one more local test.

Otherwise / alternatively, if you're comfortable dabbling in python, I can show you where the raw data is stored that dRep uses to do the clustering. Going from "distance matrix" to "clusters" is pretty simple (~5 lines of python code), so if you're comfortable with that, it's easy enough to try out whatever clustering methods / thresholds you'd like.

Best, Matt

MrOlm avatar Feb 25 '25 23:02 MrOlm

Hi Matt, Sorry I lost track of this for a little bit! But, I tried again today... and it's still not working :(

I ran it a bunch of ways on my little test dataset, but it always re-did primary clustering as well as the secondary ANI calculations. Here's a summary, I can send you the output for any of these but they all took the same amount of time (2 min) and report that they are running mash and fastANI in the output. I have just labelled them a-i to keep them straight:

a-test:	Just ran my original command: 
		with --multiround_primary_clustering
b-test:	Re-ran with single linkage (deleted data_tables folder)
		without --multiround_primary_clustering

c-test: Ran original command, but in debug mode
		with --multiround_primary_clustering
		with --debug
d-test:	Re-ran with single linkage (deleted data_tables folder)
		without --multiround_primary_clustering
		with --debug

e-test:	Ran original command, but in debug mode (same as c-test)
		with --multiround_primary_clustering
		with --debug
f-test: Re-ran with single linkage (deleted data_tables folder)
		with  --multiround_primary_clustering
		with --debug

g-test:	New original command
		without --multiround_primary_clustering
		with --debug
h-test: Re-ran with single linkage (deleted data_tables folder)
		without --multiround_primary_clustering
		with --debug
i-test: Re-ran with average linkage (deleted data_tables folder)
		without --multiround_primary_clustering
		with --debug
		without specifying --P_ani or --S_algorithm

And this is the general command that my notes above describe adjustments of:

c-test:
dRep dereplicate 
	drep_output 
	-p 15 
	-g paths.txt 
	-comp 50 
	-con 10 
	--genomeInfo checkm.csv 
	--S_algorithm fastANI 
	--P_ani 0.8 
	--S_ani 0.96 
	--clusterAlg average 
	--multiround_primary_clustering 
	--skip_plots 
	--debug 
	> termout_c-test.txt 2>&1
ie:
$ date ; dRep dereplicate drep_output -p 15 -g midgard_paths_table_for_test_drep.txt -comp 50 -con 10 --genomeInfo midgard_checkM_table_for_test_drep.csv --S_algorithm fastANI --P_ani 0.8 --S_ani 0.96 --clusterAlg average --multiround_primary_clustering --skip_plots --debug > termout_c-test.txt 2>&1 ; date

I am more than happy to try with adjusted commands (it's all set up and only takes a minute), but I am also happy to mess with python.

But in that case, I'd love some pointers on which intermediate files are used, and which of your python scripts are running the clustering part.

Thanks so much again! Robin

rrohwer avatar Mar 29 '25 01:03 rrohwer

Hi Robin,

Wow- thanks for running all these tests! Do you happen to have the log files from these tests that you would be willing to share? That would be helpful in seeing what's going on.

Thanks, Matt

MrOlm avatar Mar 29 '25 13:03 MrOlm

Hi Matt,

Here is the final logger.log file ("i-test", so this was run after deleting the data_tables folder from the previous run). I didn't save all of them, just checked for errors and to see the call to mash happened. But I can run whichever command might be most insightful again no problem! Thanks for all your help!

(base) rrohwer@midgard:~/test_drep_reclustering$ cat drep_output/log/logger.log
03-28 18:50 DEBUG    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
03-28 18:50 DEBUG    ***Logger started up at /home/rrohwer/test_drep_reclustering/drep_output/log/logger.log***
03-28 18:50 DEBUG    Command to run dRep was: /home/rrohwer/miniconda3/envs/drep/bin/dRep dereplicate drep_output -p 15 -g midgard_paths_table_for_test_drep.txt -comp 50 -con 10 --genomeInfo midgard_checkM_table_for_test_drep.csv --S_algorithm fastANI --P_ani 0.8 --S_ani 0.96 --clusterAlg average --skip_plots --debug

03-28 18:50 DEBUG    dRep version 3.5.0 was run

03-28 18:50 DEBUG    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

03-28 18:50 DEBUG    Namespace(MASH_sketch=1000, N50_weight=0.5, P_ani=0.8, S_algorithm='fastANI', S_ani=0.96, SkipMash=False, SkipSecondary=False, centrality_weight=1, checkM_method='lineage_wf', checkm_group_size=2000, clusterAlg='average', completeness=50.0, completeness_weight=1, contamination=10.0, contamination_weight=5, cov_thresh=0.1, coverage_method='larger', debug=True, extra_weight_table=None, gen_warnings=False, genomeInfo='midgard_checkM_table_for_test_drep.csv', genomes=['midgard_paths_table_for_test_drep.txt'], greedy_secondary_clustering=False, ignoreGenomeQuality=False, length=50000, multiround_primary_clustering=False, n_PRESET='normal', operation='dereplicate', primary_chunksize=5000, processors=15, run_tertiary_clustering=False, set_recursion='0', size_weight=0, skani_extra='', skip_plots=True, strain_heterogeneity_weight=1, warn_aln=0.25, warn_dist=0.25, warn_sim=0.98, work_directory='drep_output')
03-28 18:50 DEBUG    Starting the dereplicate operation
03-28 18:50 INFO     ***************************************************
    ..:: dRep dereplicate Step 1. Filter ::..
***************************************************

03-28 18:50 DEBUG    Loading work directory in filter
03-28 18:50 DEBUG    Located: /home/rrohwer/test_drep_reclustering/drep_output
Datatables: []
Cluster files: []
Arguments: []
03-28 18:50 DEBUG    Validating filter arguments
03-28 18:50 INFO     Will filter the genome list
03-28 18:50 INFO     Loading genomes from a list
03-28 18:50 INFO     125 genomes were input to dRep
03-28 18:50 INFO     Calculating genome info of genomes
03-28 18:50 DEBUG    Filtering genomes by size
03-28 18:50 INFO     100.00% of genomes passed length filtering
03-28 18:50 DEBUG    Loading provided genome quality information
03-28 18:50 DEBUG    HERE IS GENOME INFO:
03-28 18:50 DEBUG
                                          genome  completeness  contamination
0  ME2000-03-30D8pf_3300042899_group1_bin108.fna         58.21           0.59
1  ME2000-05-11D8pf_3300042483_group1_bin124.fna         55.63           2.87
2  ME2000-05-11D8pf_3300042483_group1_bin171.fna         58.77           0.37
3  ME2000-05-11D8pf_3300042483_group1_bin177.fna         50.54           2.80
4     ME2000-05-25pf_3300042154_group1_bin40.fna         53.18           0.14
03-28 18:50 DEBUG    There are the columns: ['genome', 'completeness', 'contamination']
03-28 18:50 DEBUG    Filtering genomes
03-28 18:50 INFO     100.00% of genomes passed checkM filtering
03-28 18:50 DEBUG    Storing resulting files
03-28 18:50 INFO     ***************************************************
    ..:: dRep dereplicate Step 2. Cluster ::..
***************************************************

03-28 18:50 INFO     Running primary clustering
03-28 18:50 INFO     Running pair-wise MASH clustering
03-28 18:50 DEBUG    Clustering MASH database
03-28 18:50 DEBUG    Debug mode on - saving Mdb ASAP
03-28 18:50 DEBUG    Debug mode on - saving CdbF ASAP
03-28 18:50 DEBUG    Saving primary_linkage pickle to /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/
03-28 18:50 INFO     7 primary clusters made
03-28 18:50 INFO     Running secondary clustering
03-28 18:50 INFO     Running 2725 fastANI comparisons- should take ~ 12.5 min
03-28 18:50 DEBUG    running cluster 2
03-28 18:50 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_svurjfadur --matrix -t 15 --minFraction 0 svurjfadur
03-28 18:51 DEBUG    running cluster 3
03-28 18:51 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_kaolqkoyvx --matrix -t 15 --minFraction 0 kaolqkoyvx
03-28 18:51 DEBUG    running cluster 4
03-28 18:51 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_jzfolxowfz --matrix -t 15 --minFraction 0 jzfolxowfz
03-28 18:51 DEBUG    running cluster 5
03-28 18:51 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_amgvubemxr --matrix -t 15 --minFraction 0 amgvubemxr
03-28 18:52 DEBUG    running cluster 7
03-28 18:52 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_zihrzafneb --matrix -t 15 --minFraction 0 zihrzafneb
03-28 18:52 DEBUG    running cluster 1
03-28 18:52 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_xnxxnaejul --matrix -t 15 --minFraction 0 xnxxnaejul
03-28 18:52 DEBUG    running cluster 6
03-28 18:52 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_kduipkyuba --matrix -t 15 --minFraction 0 kduipkyuba
03-28 18:52 DEBUG    Clustering ANIn database
03-28 18:52 DEBUG    making dictionary for average_ani
03-28 18:52 DEBUG    list comprehension for average_ani
03-28 18:52 DEBUG    averageing done
03-28 18:52 DEBUG    Clustering ANIn database
03-28 18:52 DEBUG    making dictionary for average_ani
03-28 18:52 DEBUG    list comprehension for average_ani
03-28 18:52 DEBUG    averageing done
03-28 18:52 DEBUG    Clustering ANIn database
03-28 18:52 DEBUG    making dictionary for average_ani
03-28 18:52 DEBUG    list comprehension for average_ani
03-28 18:52 DEBUG    averageing done
03-28 18:52 DEBUG    Clustering ANIn database
03-28 18:52 DEBUG    making dictionary for average_ani
03-28 18:52 DEBUG    list comprehension for average_ani
03-28 18:52 DEBUG    averageing done
03-28 18:52 DEBUG    Clustering ANIn database
03-28 18:52 DEBUG    making dictionary for average_ani
03-28 18:52 DEBUG    list comprehension for average_ani
03-28 18:52 DEBUG    averageing done
03-28 18:52 DEBUG    Clustering ANIn database
03-28 18:52 DEBUG    making dictionary for average_ani
03-28 18:52 DEBUG    list comprehension for average_ani
03-28 18:52 DEBUG    averageing done
03-28 18:52 DEBUG    Clustering ANIn database
03-28 18:52 DEBUG    making dictionary for average_ani
03-28 18:52 DEBUG    list comprehension for average_ani
03-28 18:52 DEBUG    averageing done
03-28 18:52 DEBUG    Debug mode on - saving Ndb ASAP
03-28 18:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_1.pickle
03-28 18:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_2.pickle
03-28 18:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_3.pickle
03-28 18:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_4.pickle
03-28 18:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_5.pickle
03-28 18:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_6.pickle
03-28 18:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_7.pickle
03-28 18:52 INFO     Step 4. Return output
03-28 18:52 DEBUG    Main program run complete- saving output
03-28 18:52 INFO     ***************************************************
    ..:: dRep dereplicate Step 3. Choose ::..
***************************************************

03-28 18:52 INFO     Loading work directory
03-28 18:52 DEBUG    Located: /home/rrohwer/test_drep_reclustering/drep_output
Datatables: ['Cdb', 'Ndb', 'CdbF', 'Bdb', 'Mdb', 'genomeInfo']
Cluster files: ['secondary_linkage_cluster_1', 'secondary_linkage_cluster_4', 'primary_linkage', 'secondary_linkage_cluster_5', 'secondary_linkage_cluster_6', 'secondary_linkage_cluster_2', 'secondary_linkage_cluster_7', 'secondary_linkage_cluster_3']
Arguments: ['cluster']
03-28 18:52 DEBUG    Loading provided genome quality information
03-28 18:52 DEBUG    HERE IS GENOME INFO:
03-28 18:52 DEBUG
                                          genome  completeness  contamination
0  ME2000-03-30D8pf_3300042899_group1_bin108.fna         58.21           0.59
1  ME2000-05-11D8pf_3300042483_group1_bin124.fna         55.63           2.87
2  ME2000-05-11D8pf_3300042483_group1_bin171.fna         58.77           0.37
3  ME2000-05-11D8pf_3300042483_group1_bin177.fna         50.54           2.80
4     ME2000-05-25pf_3300042154_group1_bin40.fna         53.18           0.14
03-28 18:52 DEBUG    There are the columns: ['genome', 'completeness', 'contamination']
03-28 18:52 DEBUG    Sdb finished
03-28 18:52 DEBUG    Wdb finished
03-28 18:52 DEBUG    saving dereplicated genomes
03-28 18:52 INFO     ***************************************************
    ..:: dRep dereplicate Step 4. Evaluate ::..
***************************************************

03-28 18:52 DEBUG    Loading work directory
03-28 18:52 DEBUG    Located: /home/rrohwer/test_drep_reclustering/drep_output
Datatables: ['Cdb', 'Ndb', 'Wdb', 'Sdb', 'CdbF', 'genomeInformation', 'Bdb', 'Mdb', 'genomeInfo']
Cluster files: ['secondary_linkage_cluster_1', 'secondary_linkage_cluster_4', 'primary_linkage', 'secondary_linkage_cluster_5', 'secondary_linkage_cluster_6', 'secondary_linkage_cluster_2', 'secondary_linkage_cluster_7', 'secondary_linkage_cluster_3']
Arguments: ['cluster']
03-28 18:52 DEBUG    evaluating ['3']
03-28 18:52 INFO     will produce Widb (winner information db)
03-28 18:52 INFO     Winner database saved to /home/rrohwer/test_drep_reclustering/drep_outputdata_tables/Widb.csv
03-28 18:52 INFO     ***************************************************
    ..:: dRep dereplicate Step 5. Analyze ::..
***************************************************

03-28 18:52 INFO     making plots
03-28 18:52 INFO
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

    ..:: dRep dereplicate finished ::..

Dereplicated genomes................. /home/rrohwer/test_drep_reclustering/drep_output/dereplicated_genomes/
Dereplicated genomes information..... /home/rrohwer/test_drep_reclustering/drep_output/data_tables/Widb.csv
Figures.............................. /home/rrohwer/test_drep_reclustering/drep_output/figures/
Warnings............................. /home/rrohwer/test_drep_reclustering/drep_output/log/warnings.txt

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

03-28 18:52 DEBUG    Finished the dereplicate operation!
03-28 18:58 DEBUG    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
03-28 18:58 DEBUG    ***Logger started up at /home/rrohwer/test_drep_reclustering/drep_output/log/logger.log***
03-28 18:58 DEBUG    Command to run dRep was: /home/rrohwer/miniconda3/envs/drep/bin/dRep dereplicate drep_output -p 15 -g midgard_paths_table_for_test_drep.txt -comp 50 -con 10 --genomeInfo midgard_checkM_table_for_test_drep.csv --S_algorithm fastANI --P_ani 0.8 --S_ani 0.96 --clusterAlg single --skip_plots --debug

03-28 18:58 DEBUG    dRep version 3.5.0 was run

03-28 18:58 DEBUG    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

03-28 18:58 DEBUG    Namespace(MASH_sketch=1000, N50_weight=0.5, P_ani=0.8, S_algorithm='fastANI', S_ani=0.96, SkipMash=False, SkipSecondary=False, centrality_weight=1, checkM_method='lineage_wf', checkm_group_size=2000, clusterAlg='single', completeness=50.0, completeness_weight=1, contamination=10.0, contamination_weight=5, cov_thresh=0.1, coverage_method='larger', debug=True, extra_weight_table=None, gen_warnings=False, genomeInfo='midgard_checkM_table_for_test_drep.csv', genomes=['midgard_paths_table_for_test_drep.txt'], greedy_secondary_clustering=False, ignoreGenomeQuality=False, length=50000, multiround_primary_clustering=False, n_PRESET='normal', operation='dereplicate', primary_chunksize=5000, processors=15, run_tertiary_clustering=False, set_recursion='0', size_weight=0, skani_extra='', skip_plots=True, strain_heterogeneity_weight=1, warn_aln=0.25, warn_dist=0.25, warn_sim=0.98, work_directory='drep_output')
03-28 18:58 DEBUG    Starting the dereplicate operation
03-28 18:58 INFO     ***************************************************
    ..:: dRep dereplicate Step 1. Filter ::..
***************************************************

03-28 18:58 DEBUG    Loading work directory in filter
03-28 18:58 DEBUG    Located: /home/rrohwer/test_drep_reclustering/drep_output
Datatables: []
Cluster files: ['secondary_linkage_cluster_1', 'secondary_linkage_cluster_4', 'primary_linkage', 'secondary_linkage_cluster_5', 'secondary_linkage_cluster_6', 'secondary_linkage_cluster_2', 'secondary_linkage_cluster_7', 'secondary_linkage_cluster_3']
Arguments: ['cluster']
03-28 18:58 DEBUG    Validating filter arguments
03-28 18:58 INFO     Will filter the genome list
03-28 18:58 INFO     Loading genomes from a list
03-28 18:58 INFO     125 genomes were input to dRep
03-28 18:58 INFO     Calculating genome info of genomes
03-28 18:58 DEBUG    Filtering genomes by size
03-28 18:58 INFO     100.00% of genomes passed length filtering
03-28 18:58 DEBUG    Loading provided genome quality information
03-28 18:58 DEBUG    HERE IS GENOME INFO:
03-28 18:58 DEBUG
                                          genome  completeness  contamination
0  ME2000-03-30D8pf_3300042899_group1_bin108.fna         58.21           0.59
1  ME2000-05-11D8pf_3300042483_group1_bin124.fna         55.63           2.87
2  ME2000-05-11D8pf_3300042483_group1_bin171.fna         58.77           0.37
3  ME2000-05-11D8pf_3300042483_group1_bin177.fna         50.54           2.80
4     ME2000-05-25pf_3300042154_group1_bin40.fna         53.18           0.14
03-28 18:58 DEBUG    There are the columns: ['genome', 'completeness', 'contamination']
03-28 18:58 DEBUG    Filtering genomes
03-28 18:58 INFO     100.00% of genomes passed checkM filtering
03-28 18:58 DEBUG    Storing resulting files
03-28 18:58 INFO     ***************************************************
    ..:: dRep dereplicate Step 2. Cluster ::..
***************************************************

03-28 18:58 INFO     Running primary clustering
03-28 18:58 INFO     Running pair-wise MASH clustering
03-28 18:58 DEBUG    Clustering MASH database
03-28 18:58 DEBUG    Debug mode on - saving Mdb ASAP
03-28 18:58 DEBUG    Debug mode on - saving CdbF ASAP
03-28 18:58 DEBUG    Saving primary_linkage pickle to /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/
03-28 18:58 INFO     5 primary clusters made
03-28 18:58 INFO     Running secondary clustering
03-28 18:58 INFO     Running 3125 fastANI comparisons- should take ~ 12.6 min
03-28 18:58 DEBUG    running cluster 5
03-28 18:58 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_sjsixamyqr --matrix -t 15 --minFraction 0 sjsixamyqr
03-28 18:59 DEBUG    running cluster 4
03-28 18:59 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_rgcuqtgojq --matrix -t 15 --minFraction 0 rgcuqtgojq
03-28 18:59 DEBUG    running cluster 2
03-28 18:59 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_tjdycdofsh --matrix -t 15 --minFraction 0 tjdycdofsh
03-28 18:59 DEBUG    running cluster 3
03-28 18:59 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_jiezgnhrld --matrix -t 15 --minFraction 0 jiezgnhrld
03-28 19:00 DEBUG    running cluster 1
03-28 19:00 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_hhafcwpfbe --matrix -t 15 --minFraction 0 hhafcwpfbe
03-28 19:00 DEBUG    Clustering ANIn database
03-28 19:00 DEBUG    making dictionary for average_ani
03-28 19:00 DEBUG    list comprehension for average_ani
03-28 19:00 DEBUG    averageing done
03-28 19:00 DEBUG    Clustering ANIn database
03-28 19:00 DEBUG    making dictionary for average_ani
03-28 19:00 DEBUG    list comprehension for average_ani
03-28 19:00 DEBUG    averageing done
03-28 19:00 DEBUG    Clustering ANIn database
03-28 19:00 DEBUG    making dictionary for average_ani
03-28 19:00 DEBUG    list comprehension for average_ani
03-28 19:00 DEBUG    averageing done
03-28 19:00 DEBUG    Clustering ANIn database
03-28 19:00 DEBUG    making dictionary for average_ani
03-28 19:00 DEBUG    list comprehension for average_ani
03-28 19:00 DEBUG    averageing done
03-28 19:00 DEBUG    Clustering ANIn database
03-28 19:00 DEBUG    making dictionary for average_ani
03-28 19:00 DEBUG    list comprehension for average_ani
03-28 19:00 DEBUG    averageing done
03-28 19:00 DEBUG    Debug mode on - saving Ndb ASAP
03-28 19:00 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_1.pickle
03-28 19:00 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_2.pickle
03-28 19:00 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_3.pickle
03-28 19:00 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_4.pickle
03-28 19:00 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_5.pickle
03-28 19:00 INFO     Step 4. Return output
03-28 19:00 DEBUG    Main program run complete- saving output
03-28 19:00 INFO     ***************************************************
    ..:: dRep dereplicate Step 3. Choose ::..
***************************************************

03-28 19:00 INFO     Loading work directory
03-28 19:00 DEBUG    Located: /home/rrohwer/test_drep_reclustering/drep_output
Datatables: ['Cdb', 'Ndb', 'CdbF', 'Bdb', 'Mdb', 'genomeInfo']
Cluster files: ['secondary_linkage_cluster_1', 'secondary_linkage_cluster_4', 'primary_linkage', 'secondary_linkage_cluster_5', 'secondary_linkage_cluster_2', 'secondary_linkage_cluster_3']
Arguments: ['cluster']
03-28 19:00 DEBUG    Loading provided genome quality information
03-28 19:00 DEBUG    HERE IS GENOME INFO:
03-28 19:00 DEBUG
                                          genome  completeness  contamination
0  ME2000-03-30D8pf_3300042899_group1_bin108.fna         58.21           0.59
1  ME2000-05-11D8pf_3300042483_group1_bin124.fna         55.63           2.87
2  ME2000-05-11D8pf_3300042483_group1_bin171.fna         58.77           0.37
3  ME2000-05-11D8pf_3300042483_group1_bin177.fna         50.54           2.80
4     ME2000-05-25pf_3300042154_group1_bin40.fna         53.18           0.14
03-28 19:00 DEBUG    There are the columns: ['genome', 'completeness', 'contamination']
03-28 19:00 DEBUG    Sdb finished
03-28 19:00 DEBUG    Wdb finished
03-28 19:00 DEBUG    saving dereplicated genomes
03-28 19:00 INFO     ***************************************************
    ..:: dRep dereplicate Step 4. Evaluate ::..
***************************************************

03-28 19:00 DEBUG    Loading work directory
03-28 19:00 DEBUG    Located: /home/rrohwer/test_drep_reclustering/drep_output
Datatables: ['Cdb', 'Ndb', 'Wdb', 'Sdb', 'CdbF', 'genomeInformation', 'Bdb', 'Mdb', 'genomeInfo']
Cluster files: ['secondary_linkage_cluster_1', 'secondary_linkage_cluster_4', 'primary_linkage', 'secondary_linkage_cluster_5', 'secondary_linkage_cluster_2', 'secondary_linkage_cluster_3']
Arguments: ['cluster']
03-28 19:00 DEBUG    evaluating ['3']
03-28 19:00 INFO     will produce Widb (winner information db)
03-28 19:00 INFO     Winner database saved to /home/rrohwer/test_drep_reclustering/drep_outputdata_tables/Widb.csv
03-28 19:00 INFO     ***************************************************
    ..:: dRep dereplicate Step 5. Analyze ::..
***************************************************

03-28 19:00 INFO     making plots
03-28 19:00 INFO
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

    ..:: dRep dereplicate finished ::..

Dereplicated genomes................. /home/rrohwer/test_drep_reclustering/drep_output/dereplicated_genomes/
Dereplicated genomes information..... /home/rrohwer/test_drep_reclustering/drep_output/data_tables/Widb.csv
Figures.............................. /home/rrohwer/test_drep_reclustering/drep_output/figures/
Warnings............................. /home/rrohwer/test_drep_reclustering/drep_output/log/warnings.txt

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

03-28 19:00 DEBUG    Finished the dereplicate operation!
03-28 19:50 DEBUG    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
03-28 19:50 DEBUG    ***Logger started up at /home/rrohwer/test_drep_reclustering/drep_output/log/logger.log***
03-28 19:50 DEBUG    Command to run dRep was: /home/rrohwer/miniconda3/envs/drep/bin/dRep dereplicate drep_output -p 15 -g midgard_paths_table_for_test_drep.txt -comp 50 -con 10 --genomeInfo midgard_checkM_table_for_test_drep.csv --S_ani 0.96 --clusterAlg average --skip_plots --debug

03-28 19:50 DEBUG    dRep version 3.5.0 was run

03-28 19:50 DEBUG    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

03-28 19:50 DEBUG    Namespace(MASH_sketch=1000, N50_weight=0.5, P_ani=0.9, S_algorithm='fastANI', S_ani=0.96, SkipMash=False, SkipSecondary=False, centrality_weight=1, checkM_method='lineage_wf', checkm_group_size=2000, clusterAlg='average', completeness=50.0, completeness_weight=1, contamination=10.0, contamination_weight=5, cov_thresh=0.1, coverage_method='larger', debug=True, extra_weight_table=None, gen_warnings=False, genomeInfo='midgard_checkM_table_for_test_drep.csv', genomes=['midgard_paths_table_for_test_drep.txt'], greedy_secondary_clustering=False, ignoreGenomeQuality=False, length=50000, multiround_primary_clustering=False, n_PRESET='normal', operation='dereplicate', primary_chunksize=5000, processors=15, run_tertiary_clustering=False, set_recursion='0', size_weight=0, skani_extra='', skip_plots=True, strain_heterogeneity_weight=1, warn_aln=0.25, warn_dist=0.25, warn_sim=0.98, work_directory='drep_output')
03-28 19:50 DEBUG    Starting the dereplicate operation
03-28 19:50 INFO     ***************************************************
    ..:: dRep dereplicate Step 1. Filter ::..
***************************************************

03-28 19:50 DEBUG    Loading work directory in filter
03-28 19:50 DEBUG    Located: /home/rrohwer/test_drep_reclustering/drep_output
Datatables: []
Cluster files: ['secondary_linkage_cluster_1', 'secondary_linkage_cluster_4', 'primary_linkage', 'secondary_linkage_cluster_5', 'secondary_linkage_cluster_2', 'secondary_linkage_cluster_3']
Arguments: ['cluster']
03-28 19:50 DEBUG    Validating filter arguments
03-28 19:50 INFO     Will filter the genome list
03-28 19:50 INFO     Loading genomes from a list
03-28 19:50 INFO     125 genomes were input to dRep
03-28 19:50 INFO     Calculating genome info of genomes
03-28 19:50 DEBUG    Filtering genomes by size
03-28 19:50 INFO     100.00% of genomes passed length filtering
03-28 19:50 DEBUG    Loading provided genome quality information
03-28 19:50 DEBUG    HERE IS GENOME INFO:
03-28 19:50 DEBUG
                                          genome  completeness  contamination
0  ME2000-03-30D8pf_3300042899_group1_bin108.fna         58.21           0.59
1  ME2000-05-11D8pf_3300042483_group1_bin124.fna         55.63           2.87
2  ME2000-05-11D8pf_3300042483_group1_bin171.fna         58.77           0.37
3  ME2000-05-11D8pf_3300042483_group1_bin177.fna         50.54           2.80
4     ME2000-05-25pf_3300042154_group1_bin40.fna         53.18           0.14
03-28 19:50 DEBUG    There are the columns: ['genome', 'completeness', 'contamination']
03-28 19:50 DEBUG    Filtering genomes
03-28 19:50 INFO     100.00% of genomes passed checkM filtering
03-28 19:50 DEBUG    Storing resulting files
03-28 19:50 INFO     ***************************************************
    ..:: dRep dereplicate Step 2. Cluster ::..
***************************************************

03-28 19:50 INFO     Running primary clustering
03-28 19:50 INFO     Running pair-wise MASH clustering
03-28 19:50 DEBUG    Clustering MASH database
03-28 19:50 DEBUG    Debug mode on - saving Mdb ASAP
03-28 19:50 DEBUG    Debug mode on - saving CdbF ASAP
03-28 19:50 DEBUG    Saving primary_linkage pickle to /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/
03-28 19:50 INFO     19 primary clusters made
03-28 19:50 INFO     Running secondary clustering
03-28 19:50 INFO     Running 975 fastANI comparisons- should take ~ 12.3 min
03-28 19:50 DEBUG    running cluster 3
03-28 19:50 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_lebzgrbfdu --matrix -t 15 --minFraction 0 lebzgrbfdu
03-28 19:50 DEBUG    running cluster 8
03-28 19:50 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_lpxkxqltcn --matrix -t 15 --minFraction 0 lpxkxqltcn
03-28 19:51 DEBUG    running cluster 10
03-28 19:51 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_oeufdssswl --matrix -t 15 --minFraction 0 oeufdssswl
03-28 19:51 DEBUG    running cluster 2
03-28 19:51 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_ggqcsdviws --matrix -t 15 --minFraction 0 ggqcsdviws
03-28 19:51 DEBUG    running cluster 12
03-28 19:51 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_fjnphporjv --matrix -t 15 --minFraction 0 fjnphporjv
03-28 19:51 DEBUG    running cluster 19
03-28 19:51 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_rmlcctiril --matrix -t 15 --minFraction 0 rmlcctiril
03-28 19:51 DEBUG    running cluster 16
03-28 19:51 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_dsudxayffb --matrix -t 15 --minFraction 0 dsudxayffb
03-28 19:51 DEBUG    running cluster 7
03-28 19:51 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_usqrcvjubx --matrix -t 15 --minFraction 0 usqrcvjubx
03-28 19:51 DEBUG    running cluster 1
03-28 19:51 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_zzskslkdbi --matrix -t 15 --minFraction 0 zzskslkdbi
03-28 19:51 DEBUG    running cluster 9
03-28 19:51 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_jnjulbsegi --matrix -t 15 --minFraction 0 jnjulbsegi
03-28 19:51 DEBUG    running cluster 13
03-28 19:51 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_sbnbdewtzf --matrix -t 15 --minFraction 0 sbnbdewtzf
03-28 19:51 DEBUG    running cluster 6
03-28 19:51 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_mkuqyzpnda --matrix -t 15 --minFraction 0 mkuqyzpnda
03-28 19:51 DEBUG    running cluster 4
03-28 19:51 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_vujrjczanp --matrix -t 15 --minFraction 0 vujrjczanp
03-28 19:51 DEBUG    running cluster 14
03-28 19:51 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_zrxtlntvjd --matrix -t 15 --minFraction 0 zrxtlntvjd
03-28 19:52 DEBUG    running cluster 5
03-28 19:52 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_dvnwqyggnq --matrix -t 15 --minFraction 0 dvnwqyggnq
03-28 19:52 DEBUG    running cluster 18
03-28 19:52 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_gvkfjakxlh --matrix -t 15 --minFraction 0 gvkfjakxlh
03-28 19:52 DEBUG    running cluster 11
03-28 19:52 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_eegiqbnbhz --matrix -t 15 --minFraction 0 eegiqbnbhz
03-28 19:52 DEBUG    running cluster 15
03-28 19:52 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_udmakopbhg --matrix -t 15 --minFraction 0 udmakopbhg
03-28 19:52 DEBUG    running cluster 17
03-28 19:52 DEBUG    /home/rrohwer/miniconda3/envs/drep/bin/fastANI --ql /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList --rl /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/tmp/genomeList -o /home/rrohwer/test_drep_reclustering/drep_output/data/fastANI_files/fastANI_out_etelwzsvin --matrix -t 15 --minFraction 0 etelwzsvin
03-28 19:52 DEBUG    Clustering ANIn database
03-28 19:52 DEBUG    making dictionary for average_ani
03-28 19:52 DEBUG    list comprehension for average_ani
03-28 19:52 DEBUG    averageing done
03-28 19:52 DEBUG    Clustering ANIn database
03-28 19:52 DEBUG    making dictionary for average_ani
03-28 19:52 DEBUG    list comprehension for average_ani
03-28 19:52 DEBUG    averageing done
03-28 19:52 DEBUG    Clustering ANIn database
03-28 19:52 DEBUG    making dictionary for average_ani
03-28 19:52 DEBUG    list comprehension for average_ani
03-28 19:52 DEBUG    averageing done
03-28 19:52 DEBUG    Clustering ANIn database
03-28 19:52 DEBUG    making dictionary for average_ani
03-28 19:52 DEBUG    list comprehension for average_ani
03-28 19:52 DEBUG    averageing done
03-28 19:52 DEBUG    Clustering ANIn database
03-28 19:52 DEBUG    making dictionary for average_ani
03-28 19:52 DEBUG    list comprehension for average_ani
03-28 19:52 DEBUG    averageing done
03-28 19:52 DEBUG    Clustering ANIn database
03-28 19:52 DEBUG    making dictionary for average_ani
03-28 19:52 DEBUG    list comprehension for average_ani
03-28 19:52 DEBUG    averageing done
03-28 19:52 DEBUG    Clustering ANIn database
03-28 19:52 DEBUG    making dictionary for average_ani
03-28 19:52 DEBUG    list comprehension for average_ani
03-28 19:52 DEBUG    averageing done
03-28 19:52 DEBUG    Clustering ANIn database
03-28 19:52 DEBUG    making dictionary for average_ani
03-28 19:52 DEBUG    list comprehension for average_ani
03-28 19:52 DEBUG    averageing done
03-28 19:52 DEBUG    Clustering ANIn database
03-28 19:52 DEBUG    making dictionary for average_ani
03-28 19:52 DEBUG    list comprehension for average_ani
03-28 19:52 DEBUG    averageing done
03-28 19:52 DEBUG    Clustering ANIn database
03-28 19:52 DEBUG    making dictionary for average_ani
03-28 19:52 DEBUG    list comprehension for average_ani
03-28 19:52 DEBUG    averageing done
03-28 19:52 DEBUG    Clustering ANIn database
03-28 19:52 DEBUG    making dictionary for average_ani
03-28 19:52 DEBUG    list comprehension for average_ani
03-28 19:52 DEBUG    averageing done
03-28 19:52 DEBUG    Clustering ANIn database
03-28 19:52 DEBUG    making dictionary for average_ani
03-28 19:52 DEBUG    list comprehension for average_ani
03-28 19:52 DEBUG    averageing done
03-28 19:52 DEBUG    Clustering ANIn database
03-28 19:52 DEBUG    making dictionary for average_ani
03-28 19:52 DEBUG    list comprehension for average_ani
03-28 19:52 DEBUG    averageing done
03-28 19:52 DEBUG    Clustering ANIn database
03-28 19:52 DEBUG    making dictionary for average_ani
03-28 19:52 DEBUG    list comprehension for average_ani
03-28 19:52 DEBUG    averageing done
03-28 19:52 DEBUG    Clustering ANIn database
03-28 19:52 DEBUG    making dictionary for average_ani
03-28 19:52 DEBUG    list comprehension for average_ani
03-28 19:52 DEBUG    averageing done
03-28 19:52 DEBUG    Clustering ANIn database
03-28 19:52 DEBUG    making dictionary for average_ani
03-28 19:52 DEBUG    list comprehension for average_ani
03-28 19:52 DEBUG    averageing done
03-28 19:52 DEBUG    Clustering ANIn database
03-28 19:52 DEBUG    making dictionary for average_ani
03-28 19:52 DEBUG    list comprehension for average_ani
03-28 19:52 DEBUG    averageing done
03-28 19:52 DEBUG    Clustering ANIn database
03-28 19:52 DEBUG    making dictionary for average_ani
03-28 19:52 DEBUG    list comprehension for average_ani
03-28 19:52 DEBUG    averageing done
03-28 19:52 DEBUG    Clustering ANIn database
03-28 19:52 DEBUG    making dictionary for average_ani
03-28 19:52 DEBUG    list comprehension for average_ani
03-28 19:52 DEBUG    averageing done
03-28 19:52 DEBUG    Debug mode on - saving Ndb ASAP
03-28 19:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_1.pickle
03-28 19:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_2.pickle
03-28 19:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_3.pickle
03-28 19:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_4.pickle
03-28 19:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_5.pickle
03-28 19:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_6.pickle
03-28 19:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_7.pickle
03-28 19:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_8.pickle
03-28 19:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_9.pickle
03-28 19:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_10.pickle
03-28 19:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_11.pickle
03-28 19:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_12.pickle
03-28 19:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_13.pickle
03-28 19:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_14.pickle
03-28 19:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_15.pickle
03-28 19:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_16.pickle
03-28 19:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_17.pickle
03-28 19:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_18.pickle
03-28 19:52 DEBUG    Saving secondary_linkage pickle /home/rrohwer/test_drep_reclustering/drep_output/data/Clustering_files/ to secondary_linkage_cluster_19.pickle
03-28 19:52 INFO     Step 4. Return output
03-28 19:52 DEBUG    Main program run complete- saving output
03-28 19:52 INFO     ***************************************************
    ..:: dRep dereplicate Step 3. Choose ::..
***************************************************

03-28 19:52 INFO     Loading work directory
03-28 19:52 DEBUG    Located: /home/rrohwer/test_drep_reclustering/drep_output
Datatables: ['Cdb', 'Ndb', 'CdbF', 'Bdb', 'Mdb', 'genomeInfo']
Cluster files: ['secondary_linkage_cluster_16', 'secondary_linkage_cluster_1', 'secondary_linkage_cluster_4', 'secondary_linkage_cluster_13', 'primary_linkage', 'secondary_linkage_cluster_8', 'secondary_linkage_cluster_5', 'secondary_linkage_cluster_6', 'secondary_linkage_cluster_2', 'secondary_linkage_cluster_14', 'secondary_linkage_cluster_17', 'secondary_linkage_cluster_10', 'secondary_linkage_cluster_12', 'secondary_linkage_cluster_7', 'secondary_linkage_cluster_15', 'secondary_linkage_cluster_11', 'secondary_linkage_cluster_18', 'secondary_linkage_cluster_9', 'secondary_linkage_cluster_19', 'secondary_linkage_cluster_3']
Arguments: ['cluster']
03-28 19:52 DEBUG    Loading provided genome quality information
03-28 19:52 DEBUG    HERE IS GENOME INFO:
03-28 19:52 DEBUG
                                          genome  completeness  contamination
0  ME2000-03-30D8pf_3300042899_group1_bin108.fna         58.21           0.59
1  ME2000-05-11D8pf_3300042483_group1_bin124.fna         55.63           2.87
2  ME2000-05-11D8pf_3300042483_group1_bin171.fna         58.77           0.37
3  ME2000-05-11D8pf_3300042483_group1_bin177.fna         50.54           2.80
4     ME2000-05-25pf_3300042154_group1_bin40.fna         53.18           0.14
03-28 19:52 DEBUG    There are the columns: ['genome', 'completeness', 'contamination']
03-28 19:52 DEBUG    Sdb finished
03-28 19:52 DEBUG    Wdb finished
03-28 19:52 DEBUG    saving dereplicated genomes
03-28 19:52 INFO     ***************************************************
    ..:: dRep dereplicate Step 4. Evaluate ::..
***************************************************

03-28 19:52 DEBUG    Loading work directory
03-28 19:52 DEBUG    Located: /home/rrohwer/test_drep_reclustering/drep_output
Datatables: ['Cdb', 'Ndb', 'Wdb', 'Sdb', 'CdbF', 'genomeInformation', 'Bdb', 'Mdb', 'genomeInfo']
Cluster files: ['secondary_linkage_cluster_16', 'secondary_linkage_cluster_1', 'secondary_linkage_cluster_4', 'secondary_linkage_cluster_13', 'primary_linkage', 'secondary_linkage_cluster_8', 'secondary_linkage_cluster_5', 'secondary_linkage_cluster_6', 'secondary_linkage_cluster_2', 'secondary_linkage_cluster_14', 'secondary_linkage_cluster_17', 'secondary_linkage_cluster_10', 'secondary_linkage_cluster_12', 'secondary_linkage_cluster_7', 'secondary_linkage_cluster_15', 'secondary_linkage_cluster_11', 'secondary_linkage_cluster_18', 'secondary_linkage_cluster_9', 'secondary_linkage_cluster_19', 'secondary_linkage_cluster_3']
Arguments: ['cluster']
03-28 19:52 DEBUG    evaluating ['3']
03-28 19:52 INFO     will produce Widb (winner information db)
03-28 19:52 INFO     Winner database saved to /home/rrohwer/test_drep_reclustering/drep_outputdata_tables/Widb.csv
03-28 19:52 INFO     ***************************************************
    ..:: dRep dereplicate Step 5. Analyze ::..
***************************************************

03-28 19:52 INFO     making plots
03-28 19:52 INFO
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

    ..:: dRep dereplicate finished ::..

Dereplicated genomes................. /home/rrohwer/test_drep_reclustering/drep_output/dereplicated_genomes/
Dereplicated genomes information..... /home/rrohwer/test_drep_reclustering/drep_output/data_tables/Widb.csv
Figures.............................. /home/rrohwer/test_drep_reclustering/drep_output/figures/
Warnings............................. /home/rrohwer/test_drep_reclustering/drep_output/log/warnings.txt

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

03-28 19:52 DEBUG    Finished the dereplicate operation!

rrohwer avatar Mar 29 '25 19:03 rrohwer

Oh wait- the logger file I pasted above has the last 3 tests concatenated, since only the data_tables file was removed between them. So actually that has g, h, i tests:
g - run without multiround clustering and with debug h - re-run, but with single instead of average linkage i - re-run, now with average instead of single linkage, and also try not specifying P_ani or S_algorithm since those shouldn't be re-done

Thanks! Robin

rrohwer avatar Mar 29 '25 19:03 rrohwer

Hi Robin,

OK- I've looked into this more and I believe the issue is that fastANI doesn't support loading old results. Apologies for not remembering this earlier.

I took the liberty of writing up some example python code on how to do the clustering based on the output Ndb.csv file. You can adjust the clustering method and the threshold, and the resulting "Cdb" file returned will reflect those changes.

Let me know if you have any questions!

Best, Matt

# Import
import drep
import drep.WorkDirectory
import drep.d_cluster
import drep.d_cluster.cluster_utils

# State dRep folder location
drep_folder = '/pl/active/olm-data2/Projects/2024_Bifidobacteria_Database/bifido_genomes/All_Genomes/drep_genomes_95/'

# Load Ndb
wd = drep.WorkDirectory.WorkDirectory(drep_folder)
Ndb = wd.get_db('Ndb', return_none=False)

# Recluster
Cdb, c2ret = drep.d_cluster.utils._cluster_Ndb(Ndb, clusterAlg='average', S_ani=0.95)

MrOlm avatar Apr 01 '25 21:04 MrOlm