1.4.3 Protein domains
An important concept in protein structure is that of the protein domain. In many cases, a single polypeptide can be seen to contain two or more physically distinct substructures, known as domains. Often linked by a flexible hinge region, these domains are compact and stable, with a hydrophobic core. Domains fold independently of the rest of the polypeptide, satisfying most of their residue–residue contacts internally. Typically, two or more layers of secondary structural elements effectively screen the hydrophobic core from the aqueous environment. A minimum size of 40–50 residues is required, though some domains can consist of up to 350 residues. It is estimated empirically that the number of different domain-folding arrangements is limited to approximately 2000 and, to date, half of these have been described.
The physical resolution of different portions of a polypeptide is often indicative of distinct functions for these domains. For example, Src (pronounced ‘sark’), a kinase that has a key role in intracellular signalling, has four domains: the catalytic activity of the protein resides in two domains (kinase domains) and the other two domains are important for the regulation of this activity (regulatory domains). The functional division of responsibility that domains permit will be examined in relation to some specific proteins later in this course.
Protein domains in recently evolved proteins are frequently encoded by individual exons within their genes. This observation suggests that such proteins have arisen during evolutionary history by exchange and duplication of exons coding for simpler individual protein modules, structural or functional units that are common to many different proteins. This powerful evolutionary process has been termed ‘domain shuffling’. Figure 16 illustrates how domain shuffling has resulted in the evolution of specialised serine proteases such as factor IX, which is a component of the cascade that produces blood clots in vertebrates, and urokinase and plasminogen, both of which are involved in the lysis of blood clots. Compared to an evolutionarily older serine protease such as the digestive enzyme chymotrypsin, these ‘modern’ proteases have been refined by the acquisition of additional regulatory domains, such as the epidermal growth factor (EGF) domain, a calcium binding domain or so-called ‘kringle’ domains.
Domains that have been particularly mobile in protein evolution tend to be smaller (40–200 residues) than the average domain, most likely reflecting physical limits on gene duplication. Typically, they have a core of β strands linked by large loops which often form binding sites for regulatory molecules or substrates. The structures of some such modules are illustrated in Figure 17. Notice that, in the growth factor and immunoglobulin modules, the N- and C-terminal ends are at opposite sides of the structure. This arrangement is quite common among protein modules and facilitates the linking of modules in extended series.