svtools icon indicating copy to clipboard operation
svtools copied to clipboard

merge error

Open lee039 opened this issue 6 years ago • 2 comments

Hi,

I tried to merge files from 206 samples, and encountered this error. Do you have any idea where should I look at? The merging job ends prematurely, and left "merged.lsort.vcf" and "merged.sites.vcf.gz"

"merged.lsort.vcf" is the file that contains all the variants from 206 samples, and "merged.sites.vcf.gz" is the sorted file. I could see where it stops.

###error messege### [smoove] 2020/02/23 12:14:38 starting with version 0.2.5 [smoove] 2020/02/23 12:14:38 merging 206 files [smoove] 2020/02/23 12:14:38 finished sorting 206 files; merge starting. [smoove] 2020/02/23 12:37:37 Traceback (most recent call last): File "/home/u/f056598/miniconda2/bin/svtools", line 11, in [smoove] 2020/02/23 12:37:37 sys.exit(main()) File "/home/u/f056598/miniconda2/lib/python2.7/site-packages/svtools/cli.py", line 79, in main [smoove] 2020/02/23 12:37:37 sys.exit(args.entry_point(args)) File "/home/u/f056598/miniconda2/lib/python2.7/site-packages/svtools/lmerge.py", line 622, in run_from_args [smoove] 2020/02/23 12:37:37 weighting_scheme=args.weighting_scheme) File "/home/u/f056598/miniconda2/lib/python2.7/site-packages/svtools/lmerge.py", line 587, in l_cluster_by_line [smoove] 2020/02/23 12:37:37 v_id = r_cluster(BP_l, sample_order, v_id, use_product, vcf, vcf_out, include_genotypes, weighting_scheme) File "/home/u/f056598/miniconda2/lib/python2.7/site-packages/svtools/lmerge.py", line 524, in r_cluster [smoove] 2020/02/23 12:37:37 v_id = merge(BP_r, sample_order, v_id, use_product, vcf, vcf_out, include_genotypes, weighting_scheme) File "/home/u/f056598/miniconda2/lib/python2.7/site-packages/svtools/lmerge.py", line 493, in merge [smoove] 2020/02/23 12:37:37 var=create_merged_variant(BP, cliq, v_id, vcf, use_product, weighting_scheme) File "/home/u/f056598/miniconda2/lib/python2.7/site-packages/svtools/lmerge.py", line 236, in create_merged_variant [smoove] 2020/02/23 12:37:37 new_start_L, new_start_R, p_L , p_R, ALG = combine_pdfs(BP, c, use_product, weighting_scheme) File "/home/u/f056598/miniconda2/lib/python2.7/site-packages/svtools/lmerge.py", line 180, in combine_pdfs [smoove] 2020/02/23 12:37:37 if (a_L[c_i][pmax_i_L] == 0) or (a_R[c_i][pmax_i_R] == 0): IndexError: list index out of range 2020/02/23 12:37:37 exit status 1

Thanks a lot for your help! :)

lee039 avatar Feb 23 '20 14:02 lee039

Hi @lee039 where you able to solve this? I am also getting this same error - IndexError: list index out of range.

stubbsrl avatar Nov 02 '21 13:11 stubbsrl

Hi,

I repeated the merging step several times, but it aborted at the exact same position. You can see which position it stopped by going to the tail of the intermediate files ("merged.lsort.vcf" and "merged.sites.vcf.gz")

The position that comes after what is written in "merged.lsort.vcf" and "merged.sites.vcf.gz was causing the issue. I think the problem was that during the merging, Smoove take into account the probability of each breakpoint bp, and if the breakpoint resolution is low (i.e. due to repeats) this problem could happen. In my case, I inspected the position where the problem occurs (using IGV), and indeed the deletion was found in ~15 animals but the breakpoint was +/- 300 bp (resolution was too bad for Smoove to process? I am not entirely sure).

Thus, I parsed out all the positions that cause the problem, and save them as $1(chr) $2(pos) in a text file. As I mentioned I had ~15 samples where a deletion was discovered, however, none of them had the same starting position. Probably this indicates that the deletion is of low quality... Then used for loop & bcftools to eliminate these positions in all vcf files using code below.

for vcf in ${PATH_TO_THE_VCF_FILES}/vcf.gz; do sname=echo $vcf | cut -d'/' -f11 | cut -d'-' -f1 bcftools view -T ^$Problem_sites $vcf -O z -o results-smoove/$sname.subset.vcf.gz bcftools index -c results-smoove/$sname.subset.vcf.gz done smoove merge --name merged -f $FASTA --outdir results-smoove/ results-smoove/.subset.vcf.gz

Afterwards, the merging went without problems. Hope this works for you! :)

lee039 avatar Nov 08 '21 09:11 lee039