Mixed speed improvements with deeptrio
Dear deepvariant team,
We ran both DeepTrio 1.8.0 en 1.9.0 on GIAB data as well as own produced data we both locally have available. In the first case running DeepTrio 1.9.0 took longer (about twice as long) to complete than DeepTrio 1.8.0, in the second case we saw a minor speed improvement. Both cases were run on the same hardware. With the data used in your test for Illumina WES case we did see the improvement as noted in the release notes. Might you have an explanation to our findings?
Kind regards
Hi @MatthieuBeukers,
Could you please include the exact command that you used to run your evaluations? In addition please specify the hardware specs.
Hi @akolesnikov, We run deepvariant and deeptrio as part of our pipeline on a compute cluster containing multiple nodes with the overall following command:
local args=()
args+=("--model_type" "WES")
args+=("--ref" "/apps/data/vip/resources/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna")
args+=("--reads_child" "vip_fam0_HG002_validated.bam")
args+=("--reads_parent1" "vip_fam0_HG003_validated.bam")
args+=("--reads_parent2" "vip_fam0_HG004_validated.bam")
args+=("--sample_name_child" "HG002")
args+=("--sample_name_parent1" "HG003")
args+=("--sample_name_parent2" "HG004")
args+=("--output_gvcf_child" "vip_fam0_HG002_chunk_0_snv.g.vcf.gz")
args+=("--output_gvcf_parent1" "vip_fam0_HG002_HG003_chunk_0_snv.g.vcf.gz")
args+=("--output_gvcf_parent2" "vip_fam0_HG002_HG004_chunk_0_snv.g.vcf.gz")
args+=("--num_shards" "6")
args+=("--regions" "regions_chunk_0.bed")
args+=("--intermediate_results_dir" "intermediate_results")
# required vcf outputs that won't be used
args+=("--output_vcf_child" "vip_fam0_HG002_chunk_0_snv.vcf.gz")
args+=("--output_vcf_parent1" "vip_fam0_HG002_HG003_chunk_0_snv.vcf.gz")
args+=("--output_vcf_parent2" "vip_fam0_HG002_HG004_chunk_0_snv.vcf.gz")
args+=("--make_examples_extra_args=include_med_dp=true")
mkdir tmp
TMPDIR=tmp apptainer exec --no-mount home ${APPTAINER_CACHEDIR}/deepvariant_deeptrio-1.9.0.sif /opt/deepvariant/bin/deeptrio/run_deeptrio "${args[@]}"
Only the input and output paths change. The jobs are scheduled across the nodes via slurm.
The deeptrio for our own data and te WES testdata from the deepvariant testcase was run on a compute cluster with the following specifications: Cores per node: 61 RAM per node (MB): 459098
The deeptrio test for the GIAB data we have available was run on a cluster with the following specifications: Cores: 120 RAM per node (MB): 499072
On each cluster the slurm jobs were given the following resources on a single node: Cores: 6 RAM (MB): 36864
In all cases we do not use the GPU.
I'm not sure how you divided the data between nodes. If you simply divided into multiple consecutive blocks then the runtime could be very uneven between nodes. If this division is different for your local data and GIAB data then it could explain the difference in the total runtime. Could you run the same experiment on chr20 and use only one node?
I think this is a very complex orchestration for execution. The best thing to do is to first find out the optimal resources you need to run let's say one of the chromosomes like chr20 or chr1 as @akolesnikov suggested and base your larger chunking off of that. It is very much possible that you are throttling the CPUs with limited RAM so they are taking much longer to execute.
@kishwarshafin I have rerun the same analyses with two different RAM settings per run. In the first set of runs each job had 36GB of RAM available, in the second set of runs each job had 80GB of RAM available. In both cases the number of cpus was the same. This resulted in no noteworthy difference in the amount of time it took to complete each job, so it doesn't seem the available RAM is the issue.
@MatthieuBeukers is it possible for you to run the pipeline outside of slurm to see if it's a scheduling issue on your HPC?
@kishwarshafin Unfortunately there isn't as the cluster we use is shared with other users as well.
@MatthieuBeukers in the metrics we report we don't see this issue and it seems like it's not possible to reproduce this issue that is system-specific for you. I would suggest you consult with how your system is set up to get further help.
I am closing this issue as it is non reproducible for us, I apologize.