pyfastx
pyfastx copied to clipboard
Sequence compostion
Hi,
I am using angsd to produce fasta sequences, these are automatically gzipped fasta files.
If I open the fasta file with R-Biostrings the sequence compositon looks like this:
dna<-readDNAStringSet("WSBg.asm5.fa.gz")
> alphabetFrequency(dna[1])
A C G T M R W S Y K V H D B N - + .
[1,] 53974765 37689595 37636633 53870814 0 0 0 0 0 0 0 0 0 0 11982472 0 0 0
However, for the same fasta.gz file the sequence composition with pyfastx looks like this:
fa=pyfastx.Fasta("WSBg.asm5.fa.gz")
s1=fa['chr1']
s1.composition
{'\x00': 162258284,
'A': 8774131,
'C': 5629514,
'G': 5628512,
'N': 4131093,
'T': 8732745}
Could you please indicate what the '\x00' would mean?
Can it be that pyfastx can not correctly index read these gzipped files?
Thank you in anticipation
Best regards
Kristian
Thanks. I will fix this bug
Fixed in v2.1.0