Module 1: Finding orthology with the OMA Browser

The OMA browser serves as an access point for the OMA database, which contains precomputed homology data for over 2900 extant genomes and 1133 ancestral genomes (see the latest list of species).

The OMA browser focuses on three main data types: genes, groups, and genomes.

  • Gene-centric pages provide detailed information about a specific gene, including its sequence, cross-references, functional annotations, and evolutionary data.
  • Group-centric pages classify genes into OMA Groups (Orthologous Groups; OGs) and Hierarchical Orthologous Groups (HOGs) to define families and subfamilies.
  • Genome-centric pages offer information about extant or ancestral species, associated genes, related genomes, and synteny viewers.

Back to home / Reset

1.1. Browsing the gene page

Gene-centric pages in OMA give all the information specific to a single gene in OMA. The gene is found at the top, with its OMA ID and UniProt ID. Different sub-pages are available on the left-hand scrollable menu with specific information, including the orthologs, paralogs, gene information, isoforms, GO annotations, sequences, and local extant and ancestral synteny for this gene.

Consider a scenario where you ran a gene network analysis and found that the human gene with UniProt ID OR2L5_HUMAN is involved in an interesting pathway. Search for this gene on the OMA homepage.

  • 1. Based on the “Gene information” tab, what is this gene?

    Olfactory receptor family 2 subfamily L member 5, one of many human olfactory receptors. More information can be accessed by following the link to Ensembl or UniProt.

  • 2. Where is this gene located in the genome?

    The gene is located on chromosome 1, starting at position 248021948, and ending at 248022886.

  • 3. Based on the Gene Ontology annotations, what function is this protein probably involved in? How sure are these annotations?

    The protein is probably an olfactory receptor involved in smell detection. The “olfactory receptor activity” Molecular Function has the codes IEA (Inferred from Electronic Information) and IBA (Inferred from Biological aspect of Ancestor) provided in the Evidence and reference column. Therefore, one should be careful with these annotations since they were not confirmed experimentally.

  • 4. Does this gene share any localized conserved synteny among any other species in Hominidae? If so, which ones?

    Click on the “Local synteny” tab on the left panel. The synteny viewer will show the gene and its neighboring genes (coloured rectangles) in the query genome, and in genomes of closely related species. Click on the genes to see more information about them.

    We can see that this gene is not strictly conserved, but we can find a few of its neighboring genes that are also orthologous to neighboring genes in other Hominidae species. Another striking point is that the genes surrounding the query gene are also olfactory receptors, indicating they likely arose from tandem duplication.

  • 5. Go to the orthologs table. How many orthologs are inferred by OMA overall?

    152 orthologs.

  • 6. How many 1:1 orthologs are there for the OR2L5_HUMAN gene?

    You can sort orthologs by their relation type by clicking the arrow on the column's header. Alternatively, you can type “1:1” in the search bar above the table.

    There are only four 1:1 relations, in Hominidae. This means that this specific gene was present in the ancestral Hominidae and there were no further duplications of this specific copy. Ancestral duplications before this clade or lineage-specific duplications in other clades explain the m:1 orthologs (many co-orthologs in another genome-to-one ortholog in human).

  • 7. How conserved is the domain architecture of these orthologs? What is this domain?

    There is only one domain in most species, with identical annotation. The domain architecture is very conserved. The domain is a 1 Rhodopsin 7-helix transmembrane protein domain, a transmembrane domain common in olfactory receptors.

  • 8. How many paralogs are there in Human for this gene? When did they duplicate?

    There are 3 other paralogs. They appear to have been duplicated in Catarrhini or later.

1.2. Exploring Hierarchical Orthologous Groups

The evolution of a gene family describes the history of all the genes that descended from a common ancestral gene.

A Hierarchical Orthologous Groups (HOG) is a set of genes that have descended from a common ancestral gene in a given ancestral species (i.e. at a specific taxonomic level). HOGs are hierarchical because groups defined at more recent clades are encompassed within larger groups that are defined at older clades, thus making them nested subfamilies.

The following exercises are focused on analyzing the evolutionary history of a gene family. For an introduction on how to use the iham graphical viewer (needed to answer the following questions), see our documentation.

Open the HOG page corresponding to the gene from before (OR2L5_HUMAN -> Click on the Groups button). The HOG displayed is the largest HOG in which this gene is present (known as a “Root HOG” in OMA).

  • 1. At what taxonomic level is the last common ancestral gene located at? In what common ancestral genome did all these genes descend from? At what taxonomic level did this gene originate? When did the Root HOG originate?

    These are all different ways of asking the same question. You can find the root of the HOG by mousing over the root node on the species tree.

    Eutheria

  • 2. How many ancestral genes comprise this HOG at the root level?

    1 ancestral gene, by definition of a HOG. This means that there was an ancestral species whose genome contains this ancestral gene. Over the course of evolution, it evolved through speciation and duplication, resulting in all the extant genes present today.

  • 3. How many extant genes comprise this HOG at the root level?

    In the header under the HOG ID, the root level is the first taxonomic clade from the left. You can click on the clade to access the Root HOG entry.

    198 genes

  • 4. Which extant genomes have the most copies of this gene?

    In the iham graphical viewer, each square represents a gene. See which species has the most squares.

    Fukomys damarensis (Damaraland mole-rat) with 18 genes.

  • 5. How many genes in this family (i.e. root HOG) are human genes?

    4

  • 6. In which lineages did the duplications likely take place that resulted in the multiple human genes?

    Mouse over each node of the tree from the root to humans and see when a new vertical line (which represents a duplication event) appears.

    The duplications took place in the lineages leading to Primates, Cararrhini and Hominidae.

  • 7. How many genes in this family have 5 exons? In what species?

    Set color scheme under Options.

    1, in Pteropus vampyrus

1.3. Browsing the Genomes page

Genomes on the OMA Browser can be either extant (modern-day species) or ancestral. OMA leverages HOGs to model ancestral genomes; these ancestral genomes each correspond to an internal node of the Tree of Life. Conceptually, HOGs can be thought of as ancestral genes, as they encompass orthologs and paralogs descending from a common ancestral gene at a specific taxonomic level. Thus, the HOGs are proxies for ancestral genes in a common ancestor and the collection of HOGs at a given level are proxies for ancestral genomes.

We will first explore an extant genome: Human. Search for this by typing “HUMAN” in the search bar and choosing species for the field, or search by the home page -> Explore -> Quick access to Genomes. Go to the extant human genome.

  • 1. How many genes are in this species, not including alternative splice variants?

    There are 20,430 genes in this species.

Next, let’s explore one of the ancestral genomes leading to human: Primates. Click on this genome to get to the Ancestral Genome page.

  • 2. How many genes was this primate common ancestor inferred to have?

    38,534 genes

HOG inference may not always be 100% reliable. OMA provides a “Completeness Score” to measure the HOG quality. The Completeness Score is defined as the number of species in that taxonomic clade present in the HOG / the total number of species in the taxonomic clade. As a general rule of thumb, a Completeness Score >= 30% are what we consider reliable HOGs.

  • 3. How many genes were in the primate common ancestor if we filter to only HOGs with at least 90% of the species present in the HOG?

    From the Primates ancestral genome page, click on Ancestral Genes and make sure 'Remove HOGs with completeness score below' = 0.9. Find the filtered number of ancestral genes below the table.

    13,617 genes