CNV calling
Dear @tobiasrausch,
Hi. I'm Oh.
I executed the delly for Germline CNV calling.
I want to ask about the genotype of my called CNVs (c1.cnv.bcf), or merge the calls (merged.bcf).
ref_fasta=${home_dir}/Reference/Homo_sapiens_assembly38.fasta
map_file=${home_dir}/Tools/delly/map/Homo_sapiens.GRCh38.dna.primary_assembly.fa.r101.s501.blacklist.gz
### call CNV
delly cnv \
-o c1.cnv.bcf \
-g ${ref_fasta} \
-m ${map_file} \
${sample_name}_recal.bam
### Merge CNV into a unified site list
code not shown
### Genotype CNVs for each samples
code not shown
### Merge genotype using bcftools
bcftools merge -m id -O b -o merged.bcf c1.geno.bcf c2.geno.bcf ... c100.geno.bcf
### Filter for germline CNVs
delly classify -f germline -o filtered.bcf merged.bcf
All genotype of CNVs is "./." as below.

As I know, "./." is non-call.
Anyway, I only have to use "RDCN", so does it matter?
Many thanks.
Oh.
Yes, please use CN or RDCN. The long answer is:
For copy-number variants delly is currently not using the GT field because that's commonly used for hom. ALT (1/1), het. (0/1) and hom. REF (0/0). For copy-number variants I do not know the allelic distribution. For instance if the total copy-number of a segment is 8 the allelic copy-numbers could be 4 and 4, or 8 and 0, or 1 and 7, ...
Because of that issue delly only outputs the total copy-number in FORMAT:CN and the copy-number likelihoods for each copy-number state (FORMAT:CNL).
Dear @tobiasrausch,
Hi I'm Oh.
Thanks for your reply.
Have a nice day!
Oh.
Hey, I executed the same commands. However, it says, that all my samples data has low coverage, therefore I need to increase the scanning window.
I do with with "-w 50000", then it works. However all the variants are "N".
Do these Ns represent long bases or it represents "N" base ?
If it's the second one, how do i exclude this reference "N" bases?
And when i do
delly classify -f germline -o filtered.bcf c1.bcf
The result is empty. What did i do wrong?
Thankyou
I am sorry, I still need to fix the N reference nucleotides. That's on the ToDo list. Just rely on POS and INFO/END for the size of the CNV and FORMAT/CN shows the estimated copy-number. The classify subcommand requires a multi-sample BCF file.
Thankyou for your response. I still have some questions. As I want to find the germline CNVs for each one sample. Is this possible ? Why do we need multiple samples? I haven't found any tools that do this. Some of the tools need multiple-sample, such as DELLY.
On Wed, Oct 26, 2022 at 8:37 PM Tobias Rausch @.***> wrote:
Reopened #223 https://github.com/dellytools/delly/issues/223.
— Reply to this email directly, view it on GitHub https://github.com/dellytools/delly/issues/223#event-7675414319, or unsubscribe https://github.com/notifications/unsubscribe-auth/APSYBB23MAYPZFC2EBIX2XTWFF25XANCNFSM4WYPU3LA . You are receiving this because you commented.Message ID: @.***>
Hello, excuse me. Currently I use delly to detect SV, and for .vcf result file, I find that part of the GT value is '.' or './.', but there is also a 1/1 like this, I want to ask, the first two are like this, should I filter out this kind of SV? Looking forward to hearing from you and wishing you all the best @tobiasrausch