Ensembl - adding value to animal genomes through high quality annotation

Funder: Biotechnology and Biological Sciences Research Council

Dimensions: grant.8483581


Eory, Lel (1)



  1. (1) University of Edinburgh, grid.4305.2

Research Organisations


United Kingdom




This project will deliver high quality up-to-date annotated genomes for key farmed and domesticated animals to enable research on these economically and socially important species. Research on domesticated animals has important socio-economic impacts, including underpinning and accelerating improvements in the animal sector of agriculture, contributing to medical research by providing animal models, improving animal health and welfare and informing understanding of natural and wild animal populations. High quality annotated genome sequences are key resources to enable such research. The sequence of almost all genes (a reference genome sequence) has been determined for major farmed and domesticated animal species such as cattle, goats, sheep, pigs, chickens, ducks, turkeys, dogs and horses as well as for several important fish species, including cod, rainbow trout, salmon and tilapia. However, the strings of billions of bases (symbolised as four letters A, C, G, T) that constitute these genome sequences are not particularly useful or understandable on their own. Once a genome has been sequenced, it needs to be 'annotated' (i.e. explanatory notes need to be added to identify key features within the genome sequence) in order for research scientists to make sense of it. Annotating reference genome sequences with features such as where the coding and regulatory parts of genes are located, and the bases which differ between individuals within a species (genetic variants) greatly enhances the value and utility of the genome sequence. Visualising the genome sequences complete with annotations in a freely accessible manner further improves the value of the information. Ensembl provides a means for researchers to look at or 'browse' the annotated genome information. The databases and tools provided by Ensembl have been shown to be a powerful and effective means of annotating the complex genomes of animal species including humans, mice and more recently farmed and domesticated animals. Enabled by advances in genome sequencing technologies and associated computational methods scientists around the world are generating more and better genome sequences. As the genome sequence of a single individual does not completely represent the genetic make-up of a species, scientists are also sequencing multiple individuals within a species. Individual research groups and international consortia are also generating sequence information that can be used in the annotation and analysis pipelines that we will run to identify both coding and regulatory sequences. We will use these data to annotate the genomes of farmed and domesticated animals, including aquaculture species. We will run comparative analyses to compare genomes both between species and between individuals within a species. These richly annotated genome sequences, which are in effect maps of where the coding gene content and regulatory sequences are located, will be made freely available to the scientific community and others via the Ensembl Genome Browser mounted on the World Wide Web as well as via an Application Programming Interface for power users. We will also provide between and within species comparative views. The annotated genomes that we will deliver are valuable not only to academic researchers, but also to scientists working in industry, including those in the animal breeding, animal health and pharmaceutical sectors. Keeping this information up-to-date, by characterising new genome sequences and integrating new data as it becomes available, is essential for reference genome sequences to remain current and useful. Technical Summary High quality annotated genomes are essential resources for life sciences research. Draft reference genome sequences have been established for several farmed and domesticated animals: cattle, goat, pig, sheep; chicken, duck, turkey; dog, horse; rainbow trout, salmon, tilapia. Substantially improved genome assemblies have been established for goat, pig, cattle, sheep, water buffalo, chicken) using long read sequencing technologies. There are gaps in the annotation of these genomes in terms of transcript complexity, non-coding genes, pseudogenes and regulatory sequences. Moreover, the pseudo haploid genome sequence of one individual provides an incomplete view of a species' genome. Scientists are generating more and better genome sequences for additional species and individuals within a species. Researchers, especially in the FAANG and FAASG consortia are generating functional data for annotation of coding, non-coding and regulatory sequences. We will analyse and annotate farmed and domesticated animal genomes as they are released, exploiting the growing volumes of functional data (short and long read RNA-seq / transcript sequences; ChIP-seq; ATAC-Seq; CAGE; bisulfite sequence) to identify coding genes, non-coding genes and regulatory sequences. We will acquire data from re-sequencing projects to characterise genetic variation within species (SNPs, indel, structural variants) and display this variation in its genomics context. We will run comparative genomics analyses both between species and within species. We will disseminate the resulting richly annotated genome sequences freely via the Ensembl Genome Browser and via an API for power users. These annotated genomes will provide an integrated view of functional sequences (coding, non-coding and regulatory) and sequence variation for a single or multiple individuals for key farmed and domesticated animals. To maximise use of this resource we will provide demonstrations, on-line and face-to-face training.

Funding information

Funding period: 2019-2022

Funding amount: EUR 420492

Grant number: BB/S02008X/1

Research Categories

Main Subject Area