seq-scripts icon indicating copy to clipboard operation
seq-scripts copied to clipboard

scripts for sequence and feature conversion, annotation, analysis ...

** Overview :TOC:

  • [[#seq-frag----ref-based-simulation-of-readscontigs][seq-frag -- ref-based simulation of reads/contigs]]
  • [[#bio2svg----plot-bamgffbed-tracks-to-svg][bio2svg -- plot bam/gff/bed tracks to svg]]

** seq-frag -- ref-based simulation of reads/contigs Simulate fragment libraries (Illumina SE/PE/MP, Pacbio, contigs) from reference sequences. Errors models are not currently supported. ***** Dependencies

  • [[https://github.com/BioInf-Wuerzburg/perl5lib-Fastq][perl5lib-Fastq]], [[https://github.com/BioInf-Wuerzburg/perl5lib-Fasta][perl5lib-Fasta]]
  • [[http://search.cpan.org/~grommel/Math-Random-0.70/Random.pm][Math::Random]]

#+BEGIN_SRC sh cpanm Math::Random

wherever you prefer

git clone https://github.com/BioInf-Wuerzburg/perl5lib-Fasta.git git clone https://github.com/BioInf-Wuerzburg/perl5lib-Fasta.git

export PERL5LIB=/path/to/perl5lib-Fasta/Fastq:$PERL5LIB #+END_SRC

***** Usage #+BEGIN_SRC sh seq-frag MODE -l LENGTH -c COVERAGE [options ..] < FASTA

50X 100bp single end reads

seq-frag se -l 100 -c 50 < genome.fa > read.fq

50X 100bp paired end, 180bp insert

seq-frag pe -l 100 -c 50 -i 180 < genome.fa | interleaved-split 1>reads_1.fq 2>reads_2.fq

50X mate pair, 2000bp insert

seq-frag mp -l 100 -c 50 -i 2000 < genome.fa | interleaved-split 1>reads_1.fq 2>reads_2.fq

20X pacbio style fragments, mean length 2000bp

seq-frag pacbio -l 2000 -c 20 < genome.fa > pb-reads.fq

for more details

seq-frag --help #+END_SRC

***** Sample #+BEGIN_SRC sh seq-frag mp -l 100 -c 1 -i 2000 ref.fa | interleaved-split 1> r_1.fq 2> r_2.fq seq-frag mp -l 100 -c 1 -i 2000 -s ref.fa | interleaved-split 1> s_1.fq 2> s_2.fq #+END_SRC

Mapped with [[https://github.com/lh3/bwa][bwa mem]] and visualized with [[https://www.broadinstitute.org/igv/][IGV]]:

[[etc/seq-frag-mp.png]]

** bio2svg -- plot bam/gff/bed tracks to svg Plot mappings (bam), features and annotations (gff, bed) along sequences to high quality SVGs. ***** Dependencies

  • [[http://search.cpan.org/~ronan/SVG-2.33/][SVG-2.33]]

***** Usage #+BEGIN_SRC sh git clone https://github.com/thackl/perl5lib-Gff.git git clone https://github.com/thackl/perl5lib-SVG-Bio.git

export PERL5LIB=/path/to/perl5lib-Gff/lib:/path/to/perl5lib-SVG-Bio/lib:$PERL5LIB;

bio2svg --fasta FA --region REGION --gff GFF --bam BAM > SVG #+END_SRC

Region can be either a sequence ID (Chr4) or an ID with range (Chr4:521-15521).

***** Sample #+BEGIN_SRC

bio2svg --width 10000 --fa MaV-is-CrEc-001.ctg.fa --region MaV-is-CrEc-001
--gff mav-regions.gff --gff MaV-gen-2.0.maker.lifted.gff
--gff CrEc-gen-dp-1.0.all.lifted.gff
--bam PCR~MaV-is-CrEc-001.ctg.bam
--bam pr-all~MaV-is-CrEc-001.ctg-support.bam
--bam pr-all~MaV-is-CrEc-001.ctg-bridge.bam \

MaV-is-CrEc-001.ctg.svg

#+END_SRC

[[etc/bio2svg-sample.png]] [[etc/bio2svg-sample.svg]]