4 Protein families and structural evolution
The availability of genomic sequence data from every major taxonomic group of organisms on Earth has allowed extensive comparisons to be made between their protein-coding regions, with over 800 000 protein sequences from these organisms being available for comparison in 2003. From these comparisons, it has become apparent that there is extensive homology between the amino acid sequences of many proteins, even between apparently distantly related organisms. In some proteins, this homology extends across the entire protein; in others, it lies within small regions called conserved domains.
The biochemical function of almost 80% of these conserved domains is known, e.g. the ATPase domain or the lipase domain. This means that for any particular protein, it is possible to predict aspects of its biochemical function solely by examining the domains that it contains. Sequence comparisons are therefore a powerful predictive tool and are performed routinely in molecular research.
For example, when the genomic DNA of an organism becomes newly available, comparative analysis with sequences from other organisms allows the function of most of the test organism's proteins to be predicted instantly. In the case of clinically important bacteria or viruses, this knowledge can allow rapid identification of drug or vaccine targets. Knowing the functional domains within a protein does not, of course, necessarily tell you what function that protein plays within a cell. For example, knowing that a protein is an ATPase does not tell us anything about what other proteins it interacts with in the cell or in which pathways it functions.