Primates icon indicating copy to clipboard operation
Primates copied to clipboard

Complete assemblies of non-human primate genomes

Telomere-to-Telomere consortium primates project

T2T-Primates is a project of the Telomere-to-Telomere consortium and is led by the Makova, Phillippy, and Eichler labs. The project seeks to finish complete, diploid assemblies for key non-human primate species. The project is currently focused on gorilla, bonobo, chimpanzee, orangutan, and gibbon. Following the approach of the human T2T-CHM13 project, all species have been sequenced with high-coverage PacBio HiFi (>50x) and Oxford Nanopore ultra-long 100 kb+ (>30x) sequencing reads. For haplotype phasing, Dovetail Hi-C data was generated for all genomes and Strand-seq data is also expected. Parental Illumina data was collected for bonobo and gorilla, where familial trios were available.

Phase one of the project focused on completing the sex chromosomes (v1 release), and phase two focused on finishing the autosomes (v2 release). Version 2 assemblies for all species are now available and a comparative analysis is underway.

Data reuse and license

All data is released to the public domain (CC0) and we encourage its reuse. However, we are in the process of finishing and analyzing these genomes, so to avoid duplicating effort, we encourage you to contact us if you are interested in contributing. The following working groups have been formed: assembly, annotation, sex chromosomes, comparative and evolutionary genomics, segmental duplications, acrocentric chromosomes and rDNAs, satellite DNAs, mobile elements, and pangenomics.

Relevant citations:

  1. Makova K, et al. The Complete Sequence and Comparative Analysis of Ape Sex Chromosomes. bioRxiv, 2023.

Assembly releases

v2.0 (November 2023)

Version 2 diploid assemblies were generated by Verkko with additional finishing and polishing steps to reach T2T. Chromosomes were named and oriented according to the prior cytogenetics literature for each species. For convenience, the "hsa" suffix in the chromosome names refers to the human homologous chromosome, where applicable. Gorilla and bonobo were phased using familial trios, and so complete maternal and paternal haplotypes are available for these species. All other species were phased using Hi-C. In the case of Hi-C phasing, each chromosome is completely phased, but it is not known which comes from the maternal or paternal haplotype, so the higher quality haplotype was assigned to hap1 and the lower quality haplotype to hap2. All assemblies have been submitted to NCBI GenBank and are currently being processed. The curated and submitted versions can be downloaded from AWS in a variety of configurations:

There are a number of files within these directories with the following tags:

dip : diploid assembly including both haplotypes
chrEBV/MT/rDNA : consensus EBV, mitochondria, and rDNA contigs
analysis-dip : diploid assembly + MT + rDNA morph + EBV contigs
mat/pat : maternal and paternal haplotypes, with chrX in mat and chrY in pat
hap1/hap2 : hap1 and hap2 haplotypes, which chrX in hap1 and chrY in hap2
pri/alt : hap1 + ChrY (primary), hap2 - ChrY (alternate)
unloc : any unlocalized sequences from unresolved gaps

Files with the date tag 20231122 and 20231205 are the v2 assemblies that were submitted to GenBank. Both diploid and primary assemblies were submitted, but only the primary assemblies containing both chrX and chrY will be annotated and serve as a linear reference for each species. All primary haplotype chromosomes are assembled "T2T" (complete, gapless, telomere on both ends) with the exception of the large rDNA arrays; one additional gap in mPanPan1 chr22_pat_hsa21, mPonAbe1 chr18_hap1_hsa16, and mPonAbe1 chr1_hap1_hsa1; and one missing telomere from mPonPyg2 chr21_hap1_hsa20.

*Symphalangus syndactylus (mSymSyn1, siamang gibbon) has been updated to v2.1 with date tag 20240514. Chromosomes 12 and 19 were swapped to match prior chromosome assignment of this species and has been corrected.

v1.0 (December 2022)

Version 1 diploid assemblies were generated with Verkko, and contigs were chromosome-assigned and oriented by alignment to the previous references. Both X and Y chromosomes are complete for all species listed. Gorilla and bonobo were phased using familial trios, and all others using Hi-C. To avoid confusion, we have removed links to these assemblies, but they still exist in the AWS bucket.

Downloads

All generated sequencing data and assemblies are available for browsing and download from GenomeArk.

Prior assembly versions

Notes on downloading files

Files are generously hosted by Amazon Web Services under s3://genomeark. Although available as HTTP links above, download performance is improved by using the Amazon Web Services command-line interface. References should be amended to use the s3:// addressing scheme. Amending the max_concurrent_requests etc. settings as per this guide will improve download performance further.

Contact

For any problems related to this dataset, please raise issues on this GitHub repository. For general questions regarding the project, please contact [email protected]. More information about our consortium can be found on the T2T homepage.

History

* Dec 2022. v1 release.
* Nov 2023. v2 release.
* Dec 2023. hap1 hap2 swapped in mPonAbe1 chr14 (hsa13) and mSymSyn1 chr3 to keep the rDNA containing or higher quality haplotype in hap1 and in the primary assembly.
* May 2024. mSymSyn1 v2.1 release. Chr12 and Chr19 are swapped to follow prior chromosome assignments for this species.