minigraph icon indicating copy to clipboard operation
minigraph copied to clipboard

Cannot add apparent simple SV on pangenome

Open LzzCongee opened this issue 1 year ago • 0 comments

I simulated some dup and inversion on hg38 chromosomal 10 , calling it simSV.fasta, and then construct pangenome with hg38Chr10.fasta, but found that the sv simualted cannot be added td the final gfa file, which only contains one node with full sequence in hg38Chr10.fasta. finally, I found that problem is due to the function static void gg_merge_seg , when merging interval which probably contain variations; The interval was chosen incorrectly when calculating the score between two intervals

static void gg_merge_seg(const ed_intv_t *intv, int32_t n_ss, mg_msseg_t *ss)
{
                ······ other code ······
                for (i = s0->en + 1; i < s1->st; ++i)
                        mid += intv[i].sc;
                //fprintf(stderr, "XX\t%d\t%d\t%d\t%d\t%d\t%d\n", j, s0->sc, mid, s1->sc, s0->en+1, s1->st);
                if (-mid < s0->sc * 0.2 && -mid < s1->sc * 0.2) { // FIXME: mid is sometimes 0
                        s0->en = s1->en, s0->sc += s1->sc + mid;
                        s1->st = s1->en, s1->sc = 0;
                } else j0 = j;
        }
}

The starting value of the for loop was probably wrong during the calculation of the above code, according to the below in function gg_simple_cigar :

    // find the initial positions
    st = ss[j].st, en = ss[j].en; // this is a CLOSED interval
    if (st == en) continue;
    **is = &intv[st], ie = &intv[en - 1];**

It means that a interval can be defined by [st,en) which is a left open and right closed interval; And thus when calculating the score between two intervals ,the initial value should be s0->en

Image

LzzCongee avatar Feb 21 '25 07:02 LzzCongee