Sequence and Structure

How exactly do DNA and protein sequences relate to biological function? Which subsequences are critical, and which are of little importance? These open questions can be answered to a great degree by comparison of the many complete and draft genomes.

The billion year experiment of eukaryotic evolution has experimentally tested every DNA and protein residue to optimise organismal function. Even between 'close' species such as human and mouse, over two thirds of all unselected residues have been altered, and by comparing several complete or almost complete genomes (human, chimp, rat, mouse), we see the results of almost complete mutagenesis of the mammalian genome. Comparison of orthologs ('phylogenetic footprinting') shows regions and residues key for structure and function, including novel domains, active motifs and post-translational modification sites, conserved secondary structure elements and other regions.

Raw comparison of genomes yields large amounts of conflicting data. We use proprietary techniques and expert curation to determine regions and residues that are under functional or structural pressure to be conserved.

The result is a map for any protein or genomic region showing for any subset of genomes the level of selective pressure on any residue, motif or region, and for protein sequences, an interpretation of how much of such conservation is for structural, as opposed to functional reasons. For DNA regions, such comparisons are used to find important control elements in promoters, and have been successfully used to build transcriptional networks in silico.