Web based database interface for orthology prediction

OMA in a nutshell

The OMA (“Orthologous MAtrix”) project is a method and database for the inference of orthologs among complete genomes. The distinctive features of OMA are its broad scope and size, high quality of inferences, feature-rich web interface, availability of data in a wide range of formats and interfaces, and frequent update schedule of two releases per year.

OMA’s inference algorithm consists of three main phases. First, to infer homologous sequences (sequences of common ancestry), all-against-all Smith-Waterman alignments are computed and significant matches are retained. Second, to infer orthologous pairs (the subset of homologs related by speciation events), mutually closest homologs are identified based on evolutionary distances, taking into account distance inference uncertainty and the possibility for differential gene losses (for more details, see Roth et al 2008). Third, these orthologs are clustered in two different ways, which are useful for different purposes: (a) we identify cliques of orthologous pairs (“OMA groups”), which are useful as marker genes for phylogenetic reconstruction and tend to be very specific; (b) we identify hierarchical orthologous groups (“HOGs”), groups of genes defined for particular taxonomic ranges and identify all genes that have descended from a common ancestral gene in that taxonomic range. Fore more details on the algorithm to infer HOGs from orthologous pairs, see Altenhoff et al. 2013.

The OMA pipeline can also run on custom genomic/transcriptomic data using the OMA stand-alone software, and it is even possible to combine precomputed data with custom data by exporting parts of the OMA database.

For more info on the feature of the OMA Browser, please consult the help pages accessible from the navigation bar in the top-right corner.


The OMA project was initiated in 2004 at ETH Zurich by Prof. Gaston Gonnet, with the goal of identifying orthologs among all publicly available genomes. At the time, most sequenced genomes were bacteria and only only few were eukaryotes. Several PhD students in his group became increasingly involved, in particular Adrian Schneider, Christophe Dessimoz, and Alexander Roth. Over the subsequent 10 years, OMA underwent 16 major releases, steadily increasing the number of genomes under consideration.

Graph showing growth of OMA over time.

The OMA Browser was introduced in 2006. Early releases were developed by Adrian Schneider and Christophe Dessimoz. Adrian Altenhoff joined the team in 2008.

In 2008, the responsibility of “baby-sitting” the all-against-all (i.e. importing and converting genomes, running and verifying computations across hundreds of CPUs) was handed over from Gaston to the two Adrians.

Since 2010, Adrian Altenhoff has been the main baby-sitter of the all-against-all and manager of OMA’s operations. In 2011, Christophe joined Gaston as co-PI of OMA. In 2012, OMA became a SIB-funded bioinformatics resource.


OMA overview

