Gene-centric pages in OMA give all the information specific to a single gene in OMA. The gene is found at the top, with its OMA ID and UniProt ID. Different sub-pages are available on the left-hand scrollable menu with specific information, including the orthologs, paralogs, gene information, isoforms, GO annotations, sequences, and local synteny.
You ran a network analysis and found that the human gene with UniProt ID OR2L5_HUMAN is involved in an interesting pathway. Search for this gene on the OMA homepage.
1. Based on the gene information tab, what is this gene?
2. Where is this gene encoded in the genome?
3. Based on the Gene Ontology (GO) annotations, what molecular function and biological process is this protein probably involved in? How sure is this annotation?
4. Does this gene share any localised conserved synteny among any other species? If so, which ones?
5. Go to the orthologs table. How many orthologs overall are inferred by OMA?
6. How many 1:1 pairwise orthologs are there?
7. How many orthologs are inferred which are supported by HOG, pairwise, and OMA Group evidence?
8. How conserved is the domain architecture of these orthologs? What is this domain?
9. How many paralogs are there in Human? When did they duplicate?
OMA Groups are cliques of orthologs based on the orthology graph. In an OMA Group, all the genes are connected to each other by pairwise orthologous relations. For this reason, OMA Groups are typically conservative (they may exclude true orthologs) but have high confidence.
Open the OMA Groups page of the gene from before. If you are starting the module from here search for the OMA Group 297702.
1. How many members are there in the group? Why is it different from the number of pairwise 1:1 orthologs from before?
2. What is the signature sequence for the group (a sequence present in members of this group but not in other groups)?
3. Look at the alignment. How conserved are the sequences? Some proteins have different sizes, how can we explain that?
The evolution of a gene family describes the history of all the genes that shared a common ancestral gene. Those genes called homologs can be distinguished into orthologs if they start diverging by speciation and paralogs if they start diverging by duplication. In comparative genomics, gene families are a fundamental resource since they tend to represent the links between several organisms from a gene centric perspective and allow us to understand how genes and genomes have evolved over time.
A HOG is a set of genes that have descended from a common ancestral gene in a given ancestral species (i.e. at a specific taxonomic level). HOGs are hierarchical because groups defined at more recent clades are encompassed within larger groups that are defined at older clades, thus making them nested subfamilies.
The following exercises focus on analysing the evolutionary history of a gene family. For an introduction on how to use the iHam graphical viewer (needed to answer the following questions), see our documentation and YouTube video. Open the HOG page corresponding to the gene from before. It is the largest HOG in which this gene is present (root HOG). If you are starting the module from here search the HOG HOG:0434208.
1. At what taxonomic level is the last common ancestral gene located at? In what common ancestral genome did all these genes descend from? At what taxonomic level did this gene originate? When did the root hog originate?
2. How many ancestral genes comprise this HOG at the root level?
3. How many extant genes comprise this HOG at the root (eutheria) level?
4. Which extant genome have the most copies of this gene?
5. Which taxonomic clade has an unusually high GC content in its gene sequence?
6. How many genes in this family (i.e., the root HOG) are human genes?
7. In which lineages did the duplications which led to the multiple human genes likely take place?
8. How many genes in this family have five exons? In what species?
9. Freeze the tree at the Simiiformes level. Based on the iHam visualisation, which ancestral clade experienced gene loss of ancestral gene “HOG:0434208.10c”?