bioconvert icon indicating copy to clipboard operation
bioconvert copied to clipboard

fasta2bed

Open yoann-dufresne opened this issue 7 years ago • 2 comments

This converter should transform a fasta file representing a scaffold (ordered contigs) into a genome annotation following this specifications:

Fasta:

>contig_1 my first contig ACT >contig_2 my second contig GTGT

Bed:

contig_1 0 2 contig_2 3 6

yoann-dufresne avatar Jul 17 '18 12:07 yoann-dufresne

at some point we should define precisely what BED means. We have some BED with 3,6,12 columns. Here, the BED file has 3 columns. What are the second and third column meaning here. should be in the class documentation. thanks

cokelaer avatar Aug 01 '18 05:08 cokelaer

BED has optional columns, but the first 3 should always be present and are well defined.

The first three required BED fields are:

  1. chrom - The name of the chromosome (e.g. chr3, chrY, chr2_random) or scaffold (e.g. scaffold10671).
  2. chromStart - The starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0.
  3. chromEnd - The ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99.

(see https://genome.ucsc.edu/FAQ/FAQformat.html#format1)

blaiseli avatar Aug 02 '18 13:08 blaiseli