3.2 Protein structure
Proteins are one example of a biopolymer. You will already be familiar with synthetic polymers such as polyethylene and nylon: long chains made up of many thousands of repeating units, called monomers, linked together by strong covalent bonds. Polymers are particularly versatile materials because of the very different strengths of the bonds between monomer units in the chain (strong) and between one chain and another (weak). By varying the arrangement of chains within a material a huge range of properties can be achieved, from extended chain fibres as strong as steel, to soft rubbery networks where the chains are randomly coiled together, or rigid networks where cross-linking keeps every chain tightly pinned in place. The same principles apply to biopolymers, although the chemical composition of the chains tends to be rather more complex.
Despite the huge number and diversity of proteins in cells, they are all based on the same relatively small number of fundamental building blocks: 20 different amino acids. Amino acids are a class of compound with the same basic chemical structure, Figure 9 offers a diagrammatic representation and Figure 10 shows the chemical structure.
At the centre of each amino acid is a carbon atom, which forms bonds to four other groups. Three of these groups are always the same. Two of them can react together to link individual units into a chain, while the third common group is a hydrogen atom. What distinguishes one amino acid from another is the fourth group, known as the R group, shown as a hook in Figure 9. There are many possibilities for R, but only 20 of them are found in proteins in nature, at least on this planet. Figure 11 shows a few amino acid units linked together, each with a different R unit, and Figure 12 shows the chemical bonds involved.
The amino acid units are linked together along the chain by strong covalent bonds, which I have represented in Figure 11 by tightly interlocking pieces. The bonds formed are exactly the same as those that link the units together in synthetic nylons, although (for no particularly good reason) the terminology used is different. In nylons, the monomers are usually referred to as amides, the polymer as a polyamide, and the bonds linking the monomer units together as amide bonds. Biochemists are more likely to refer to amino acids, polypeptides (the protein chains) and peptide bonds.
Why have I represented some of the R units by hooks? As shown in Figure 11, the R units protrude from the side of the polypeptide chain. Some of them are very small (e.g. a hydrogen atom) while some are much larger. The side groups play an important role in the behaviour of a protein because they can form weak bonds with other units, and help to stabilise particular folded-chain arrangements. The different hooks are intended to show that the side units can link together in different ways, but that these links are much weaker than the links along the chain.
Protein chains may be many hundreds of units long, so the number of possible sequences is enormous. The thousands of different proteins found in cells represent just a small fraction of these possibilities; they have evolved over time to perform specific biological functions. Some amino acids may not be used at all in a particular protein, whereas others may occur many times. Each different protein is unique because it has its own individual sequence of amino acids along its length. The sequence of amino acids that makes up a particular protein chain is known as the primary structure of the protein.
Biochemists use the terms ‘protein’ and ‘polypeptide’ rather loosely and interchangeably, and both terms will be used here as well. When a protein is synthesised in a cell, the linear polypeptide chain folds as it is produced. The biological activity of this newly formed molecule depends on its three-dimensional, folded structure, and when considering a polypeptide chain as a three-dimensional structure, it is generally referred to as a protein.
The sequence of amino acids along a polypeptide chain determines the primary structure of the protein, but the chain is flexible and has the potential to fold up in different ways. In practice, only a very small proportion of the possible folding patterns are energetically favourable, stabilised by interactions between the ‘hooks’ protruding from the main chain. The spatial arrangement, or conformation, adopted by a chain is known as its higher-order structure and different levels of organisation can be identified. Indeed, one of the remarkable things about proteins is that any chain with the same sequence of amino acids will fold up in exactly the same way, provided the conditions are the same, although a bit of help is sometimes needed to achieve this. Thus, primary structure determines higher-order structure, which in turn, as we shall see later, determines function.
Collagen is a particularly abundant fibrous protein that accounts for 25 per cent of all body protein. It has high tensile strength and is found particularly in skin and tendons. The higher-order structure of collagen is illustrated in Figure 13. The basic structural unit is a triple helix, composed of three polymer chains (which are themselves helical) coiled round each other and held together by weak, non-covalent bonds, as shown in Figure 13(a). These units are then packed side by side in small groups of up to 200 to form larger fibrils, Figure 13(b), which are bundled together to form fibres, Figure 13(c). The fibres may themselves clump together in larger aggregates to build up body tissues. The result of all this organisation is a very strong, reasonably stiff, tough material.
One of the most powerful methods available for working out the structures of proteins is X-ray diffraction: a technique that relies on the scattering of X-ray radiation by a regular crystal lattice. It is not always easy to persuade a protein to crystallise, but as techniques improve and higher-intensity X-ray sources and more powerful analytical tools become available, more and more proteins are being characterised in this way. Computer modelling is then applied to translate a diffraction pattern to a likely molecular structure. The Protein Data Bank (PDB) is a website maintained by the Research Collaboratory for Structural Bioinformatics (RCSB), which acts as a central repository for results in this field and provides an excellent example of international cooperation between scientists. Structures for most of the proteins mentioned in this block, and interesting articles about many of them, can be found on the PDB website.
Collagen provides us with a good example of how protein molecules can self-assemble to produce a structure well suited for a particular purpose. It can also be used to demonstrate the limitations of self-assembly. The three intertwined strands that make up the basic unit of collagen are created and assembled simultaneously within the cell. If we separate the strands – which we can do very easily by heating collagen in water above 70 °C – and then allow them to re-assemble, the triple helices will not re-form. Instead, separate strands interact with one another to form many links with other molecules and the result is a large, open network. This is exactly what happens when you make a jelly. Breaking down the natural structure of a protein is known as denaturing. The individual molecules from the denatured collagen are what we call gelatin. When a jelly sets, the chains link together to form a huge, continuous, three-dimensional network like a giant cage, which traps the water inside through hydrogen bonding to the protein molecules.
Consider a short segment of protein chain, just six monomers long.
(a) What is meant by the primary structure of the chain?
(b) How many different types of monomer are available in nature?
(c) How many different sequences are possible in the six-monomer segment? Assume that all the possible monomers are available at each of the six positions.
(a) The sequence of amino acids along the chain.
(b) There are 20 naturally occurring amino acids.
(c) At each position along the chain there are 20 possibilities. So the number of possible sequences is 206 = 64 000 000.
Regular helical structures like those that occur in collagen are very common in proteins, because they allow many weak bonds to form and these stabilise the structure. Figure 14 shows a schematic representation of two particularly common regular folding patterns, the α-helix and the β-sheet.
Keratin is another important example of a fibrous protein, with a different amino acid sequence. It can exist in different forms, based either on α-helices (in mammals) or β-sheets (in birds and reptiles). It is a major component of skin, hair, nails, hooves, horns, claws, beaks, feathers and scales, which gives a good indication both of its versatility and its strength.
More disordered network structures can also be produced, by incorporating segments in the polypeptide chain that can form cross-links to neighbouring chains. This results in a rubber-like structure with high extensibility; elastin, a key component of skin, has this structure.