d4-format
d4-format copied to clipboard
Ignore "chr" prefix when running stats using an intervals file
Hello! It's not a huge deal but today It took us a while to understand why the stats returned always 0 for one of our samples. Basically if the sample has the chromosome format with the "chr" prefix then also the interval file has to have chromosomes with the prefix. Since we are working with different pipelines and apparently they have different output in terms of d4 files it would be nice to have this issue covered directly in d4tools.
Thanks in advance!
(base) chiararasi@n159-p41 d4_data % cat intervals.bed
1 11785723 11806455
11 72189558 72196323
17 28394642 28407197
18 6941742 7117797
5 80626226 80654983
(base) chiararasi@n159-p41 d4_data % d4tools stat --region intervals.bed mildlywittybat.per-base.d4 --stat mean
1 11785723 11806455 0
11 72189558 72196323 0
17 28394642 28407197 0
18 6941742 7117797 0
5 80626226 80654983 0
(base) chiararasi@n159-p41 d4_data % cat intervals_with_chr.bed
chr1 11785723 11806455
chr11 72189558 72196323
chr17 28394642 28407197
chr18 6941742 7117797
chr5 80626226 80654983
(base) chiararasi@n159-p41 d4_data % d4tools stat --region intervals_with_chr.bed mildlywittybat.per-base.d4 --stat mean
chr1 11785723 11806455 29.785597144510902
chr11 72189558 72196323 31.807538802660755
chr17 28394642 28407197 28.50115491835922
chr18 6941742 7117797 31.84388969356167
chr5 80626226 80654983 25.601175365997843
(base) chiararasi@n159-p41 d4_data %