5.1.1 What is DNA?
DNA (deoxyribonucleic acid) is frequently in the news for four main reasons.
DNA can be used in crime detection to eliminate innocent suspects from enquiries or, conversely, to identify with a very high degree of probability the guilty.
DNA is now used in medicine to detect the possibility that diseases having a genetic origin may occur in an individual. This enables doctors to prescribe preventative treatments.
It is hoped that discoveries about DNA will yield important new treatments for hitherto intractable diseases and conditions.
DNA can be used to identify victims of disasters, and establish whether people are related.
Figure 12 illustrates the following characteristics of DNA.
DNA has the shape of an immensely long twisted ladder (the famous double helix) in which each pair of chemical bases in the strand can be thought of as a rung in the ladder.
It consists of pairs of chemical bases called adenine (A), cystosine (C), guanine (G) and thymine (T).
The bases (which in Figure 12 are colour coded) can only be paired according to the rules: A to T and C to G.
A ‘rung’ or pair of bases (e.g. A–T) is called a base pair.
A nucleotide is a base pair plus its attached ‘structural’ molecules (i.e. the sides of the ladder).
Sequences of base pairs constitute genes which are the sections of a DNA strand that form discrete units of heredity (such as eye colour).
A complete DNA strand constitutes a chromosome (a human being has 46 of these combined into 23 pairs).
The four letters (A, C, G, and T) representing the DNA bases constitute ‘signs’ symbolising the building blocks of DNA. You can think of a set of signs as a code.
The English alphabet is a system of signs that consists of 26 letters, from A to Z. There are rules that govern how letters can form words in English. For example, the combination ‘m-s’ cannot be used to begin a word, but is acceptable within or at the end of a word. This limits the number of English words it is possible to form.
Words, and parts of words, can be combined to make longer words. For example, adding an ‘s’ to ‘dog’ makes ‘dogs’, and preceding ‘mill’ with ‘wind’ gives ‘windmill’. Rules also determine that ‘windmill’ is all right, but ‘millwind’ is not.
Considering these facts, how many words do you think the English language has?
Now think about things that can be said using the English language: utterances. These consist of words strung together according to a set of rules known as grammar.
How many utterances do you think it's possible to make in English?
A standard, reputable dictionary will have between 30,000 and 50,000 entries. Even this is only part of the story since most dictionaries do not include slang, dialect words or words that exist for only a very short period of time. Neither do they contain specialised vocabularies that exist in certain professional and trade groups (e.g. among doctors). Thus the likely total vocabulary of English is (at a guess) in excess of 100,000 words.
The number of utterances possible in English is virtually infinite. This is because, even given the rules of grammar, they can vary in length and word order.
Exercise 11 shows how a relatively simple code (signs like the alphabet) can be combined in simple and complex ways to produce an enormous variety of possible ‘products’ (utterances in English).
Think of the DNA bases (A, C, G, and T) as forming a code similar to the alphabet, i.e. four ‘signs’ that can be combined according to rules to form genes. The genes in turn are combined into structures called chromosomes (i.e. DNA strands) of which the human being has 46 in 23 pairs. Given this structure, a gene is analogous to an English word, a chromosome to a volume of English utterances, and all 23 pairs of chromosomes to the volumes of an encyclopedia.
At a guess, how many base pairs, like A–C, do you think the 23 pairs of human chromosomes have?
What might that answer tell you about how difficult a problem it is to develop a full understanding of the human genetic structure?
The longest human chromosome has about 263 million base pairs, the shortest 50 million. For all 23 pairs the total exceeds 3.2 billion (i.e. 3,200,000,000).
The base pairs in a gene can vary, which is what gives us genetic diversity. So the problem of trying to understand the genetic structure of humans is roughly analogous to trying to read and understand all the sentences in a huge, multi-volume encyclopedia!
These two Exercises demonstrate that having a simple code is no guarantee of a simple system! What can be produced lies not in the simplicity or complexity of the code, but in the possibilities for combinations and the stringing together of small parts to form larger products. In other words, simple elements of data can generate a huge amount of information.