Abstract
The common marmoset (Callithrix jacchus) is one of the most widely used nonhuman primate models of human disease. Owing to limitations in sequencing technology, early genome assemblies of this species using short-read sequencing suffered from gaps. In addition, the genetic diversity of the species has not yet been adequately explored. Using long-read genome sequencing and expert annotation, we generated a high-quality genome resource creating a 2.898 Gb marmoset genome in which most of the euchromatin portion is assembled contiguously (contig N50 = 25.23 Mbp, scaffold N50 = 98.2 Mbp). We then performed whole genome sequencing on 84 marmosets sampling the genetic diversity from several marmoset research centers. We identified a total of 19.1 million single nucleotide variants (SNVs), of which 11.9 million can be reliably mapped to orthologous locations in the human genome. We also observed 2.8 million small insertion/deletion variants. This dataset includes an average of 5.4 million SNVs per marmoset individual and a total of 74,088 missense variants in protein-coding genes. Of the 4956 variants orthologous to human ClinVar SNVs (present in the same annotated gene and with the same functional consequence in marmoset and human), 27 have a clinical significance of pathogenic and/or likely pathogenic. This important marmoset genomic resource will help guide genetic analyses of natural variation, the discovery of spontaneous functional variation relevant to human disease models, and the development of genetically engineered marmoset disease models.