deepvariant icon indicating copy to clipboard operation
deepvariant copied to clipboard

Mixed speed improvements with deeptrio

Open MatthieuBeukers opened this issue 7 months ago • 7 comments

Dear deepvariant team,

We ran both DeepTrio 1.8.0 en 1.9.0 on GIAB data as well as own produced data we both locally have available. In the first case running DeepTrio 1.9.0 took longer (about twice as long) to complete than DeepTrio 1.8.0, in the second case we saw a minor speed improvement. Both cases were run on the same hardware. With the data used in your test for Illumina WES case we did see the improvement as noted in the release notes. Might you have an explanation to our findings?

Kind regards

MatthieuBeukers avatar Jun 10 '25 11:06 MatthieuBeukers

Hi @MatthieuBeukers,

Could you please include the exact command that you used to run your evaluations? In addition please specify the hardware specs.

akolesnikov avatar Jun 11 '25 18:06 akolesnikov

Hi @akolesnikov, We run deepvariant and deeptrio as part of our pipeline on a compute cluster containing multiple nodes with the overall following command:

local args=()
args+=("--model_type" "WES")
args+=("--ref" "/apps/data/vip/resources/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna")
args+=("--reads_child" "vip_fam0_HG002_validated.bam")
args+=("--reads_parent1" "vip_fam0_HG003_validated.bam")
args+=("--reads_parent2" "vip_fam0_HG004_validated.bam")
args+=("--sample_name_child" "HG002")
args+=("--sample_name_parent1" "HG003")
args+=("--sample_name_parent2" "HG004")
args+=("--output_gvcf_child" "vip_fam0_HG002_chunk_0_snv.g.vcf.gz")
args+=("--output_gvcf_parent1" "vip_fam0_HG002_HG003_chunk_0_snv.g.vcf.gz")
args+=("--output_gvcf_parent2" "vip_fam0_HG002_HG004_chunk_0_snv.g.vcf.gz")
args+=("--num_shards" "6")
args+=("--regions" "regions_chunk_0.bed")
args+=("--intermediate_results_dir" "intermediate_results")
# required vcf outputs that won't be used
args+=("--output_vcf_child" "vip_fam0_HG002_chunk_0_snv.vcf.gz")
args+=("--output_vcf_parent1" "vip_fam0_HG002_HG003_chunk_0_snv.vcf.gz")
args+=("--output_vcf_parent2" "vip_fam0_HG002_HG004_chunk_0_snv.vcf.gz")
args+=("--make_examples_extra_args=include_med_dp=true")

mkdir tmp
TMPDIR=tmp apptainer exec --no-mount home ${APPTAINER_CACHEDIR}/deepvariant_deeptrio-1.9.0.sif /opt/deepvariant/bin/deeptrio/run_deeptrio "${args[@]}"

Only the input and output paths change. The jobs are scheduled across the nodes via slurm.

The deeptrio for our own data and te WES testdata from the deepvariant testcase was run on a compute cluster with the following specifications: Cores per node: 61 RAM per node (MB): 459098

The deeptrio test for the GIAB data we have available was run on a cluster with the following specifications: Cores: 120 RAM per node (MB): 499072

On each cluster the slurm jobs were given the following resources on a single node: Cores: 6 RAM (MB): 36864

In all cases we do not use the GPU.

MatthieuBeukers avatar Jun 12 '25 08:06 MatthieuBeukers

I'm not sure how you divided the data between nodes. If you simply divided into multiple consecutive blocks then the runtime could be very uneven between nodes. If this division is different for your local data and GIAB data then it could explain the difference in the total runtime. Could you run the same experiment on chr20 and use only one node?

akolesnikov avatar Jun 12 '25 18:06 akolesnikov

I think this is a very complex orchestration for execution. The best thing to do is to first find out the optimal resources you need to run let's say one of the chromosomes like chr20 or chr1 as @akolesnikov suggested and base your larger chunking off of that. It is very much possible that you are throttling the CPUs with limited RAM so they are taking much longer to execute.

kishwarshafin avatar Jun 12 '25 18:06 kishwarshafin

@kishwarshafin I have rerun the same analyses with two different RAM settings per run. In the first set of runs each job had 36GB of RAM available, in the second set of runs each job had 80GB of RAM available. In both cases the number of cpus was the same. This resulted in no noteworthy difference in the amount of time it took to complete each job, so it doesn't seem the available RAM is the issue.

MatthieuBeukers avatar Jun 17 '25 05:06 MatthieuBeukers

@MatthieuBeukers is it possible for you to run the pipeline outside of slurm to see if it's a scheduling issue on your HPC?

kishwarshafin avatar Jun 17 '25 05:06 kishwarshafin

@kishwarshafin Unfortunately there isn't as the cluster we use is shared with other users as well.

MatthieuBeukers avatar Jun 17 '25 06:06 MatthieuBeukers

@MatthieuBeukers in the metrics we report we don't see this issue and it seems like it's not possible to reproduce this issue that is system-specific for you. I would suggest you consult with how your system is set up to get further help.

I am closing this issue as it is non reproducible for us, I apologize.

kishwarshafin avatar Jun 23 '25 19:06 kishwarshafin