Module 1: Exploring Orthology with the OMA Browser

Back to home / Reset

Browsing the Gene Page

Gene-centric pages in OMA give all the information specific to a single gene in OMA. The gene is found at the top, with its OMA ID and UniProt ID. Different sub-pages are available on the left-hand scrollable menu with specific information, including the orthologs, paralogs, gene information, isoforms, GO annotations, sequences, and local synteny.

You ran a network analysis and found that the human gene with UniProt ID OR2L5_HUMAN is involved in an interesting pathway. Search for this gene on the OMA homepage.

  • 1. Based on the gene information tab, what is this gene?

    Olfactory receptor family 2 subfamily L member 5. Apparently one of many olfactory receptors in humans. More information can be accessed by following the link to Ensembl or UniProt.

  • 2. Where is this gene encoded in the genome?

    The gene is located on chromosome 1, starting at position 248021948, and ending at 248022886.

  • 3. Based on the Gene Ontology (GO) annotations, what function is this protein probably involved in? How sure is this annotation?

    Not strictly conserved but we can find few orthologs of neighbouring genes in the proximity of its orthologs. A striking point is the numerous duplications in other species.

  • 5. Go to the orthologs table. How many orthologs overall are inferred by OMA?

    145 orthologs.

  • 6. How many 1:1 pairwise orthologs are there?

    There are only 5 1:1 relations, in primates. There was probably a duplication leading to this clade (explaining the m:1 (many-to-one)) and duplication in other clades as we saw with the synteny.

  • 7. How many orthologs are inferred which are supported by HOG, pairwise, and OMA Group evidence?

    18 orthologs are supported by three sources.

  • 8. How conserved is the domain architecture of these orthologs? What is this domain?

    Only one domain, and same in most species. Architecture is very conserved. The domain is a 1 Rhopdopsin 7-helix transmembrane protein domain.

  • 9. How many paralogs are there in Human? When did they duplicate?

    3 paralogs. They appear to have duplicated in Simiiformes or later.

OMA Groups

OMA Groups are cliques of orthologs based on the orthology graph. In an OMA Group, all the genes are connected to each other by pairwise orthologous relations. For this reason, OM A Groups are typically conservative (they may exclude true orthologs) but have high confidence.

Open the OMA Groups page of the gene from before. If you are starting the module from here search for the OMA Group 297702.

  • 1. How many members are there in the group? Why is it different from the number of pairwise 1:1 orthologs from before?

    There are 19. It’s more than the number of 1:1 because the genes are not necessarily forming pairs with only one gene of a species, but just need to be a clique.

  • 2. What is the signature sequence for the group (a sequence present in members of this group but not in other groups)?

    The gene’s signature is MTFAGAE. It can be used to track OMA Groups across releases.

  • 3. Look at the alignment. How conserved are the sequences? Some proteins have different sizes, how can we explain that?

    The conservation is quite high (with 80-90% identity). Some sequences start at different positions, but this does not seem right. VULVU37201 could start at the second start codon (M) and HORSE05078 is probably missing the start codon.

Hierarchical Orthologous Groups

The evolution of a gene family describes the history of all the genes that shared a common ancestral gene. Those genes called homologs can be distinguished into orthologs if they star t diverging by speciation and paralogs if they start diverging by duplication. In comparative genomics, gene families are a fundamental resource since they tend to represent the links between several organisms f rom a gene centric perspective and allow us to understand how genes and genomes have evolved over time.
A HOG is a set of genes that have descended from a common ancestral gene in a given ancestral species (i.e. at a specific taxonomic level). HOGs are hierarchical because groups define d at more recent clades are encompassed within larger groups that are defined at older clades, thus making them nested subfamilies.

The following exercises focus on analysing the evolutionary history of a gene family. For an introduction on how to use the iHam graphical viewer (needed to answer the following quest ions), see our documentation and YouTube video. Open the HOG page corresponding to the gene from before. It is the largest HOG in which this gene is present (root HOG). If you are starting the module from here search the HOG HOG:04 34208.

  • 1. At what taxonomic level is the last common ancestral gene located at? In what common ancestral genome did all these genes descend from? At what taxonomic level did this gene ori ginate? When did the root hog originate?

    Hint: these are all different ways of asking the same question.


  • 2. How many ancestral genes comprise this HOG at the root level?


  • 3. How many extant genes comprise this HOG at the root level?

    173 genes

  • 4. Which extant genome have the most copies of this gene?

    Fukomys damarensis with 18 genes

  • 5. Which taxonomic clade has an unusually high GC content?

    Hint: set colour scheme under Options.


  • 6. How many genes in this family (i.e., the root HOG) are human genes?


  • 7. In which lineages did the duplications which led to the multiple human genes likely take place?

    Primates, Simiiformes and Homininae

  • 8. How many genes in this family have five exons? In what species?

    1, in Pteropus vampyrus

  • 9. Freeze the tree at the Simiiformes level. Based on the iHam visualisation, in which ancestral clade experienced gene loss of ancestral gene “HOG:0434208.10b”?