4.4 Representing sound
Sound, such as speech or music, is an analogue physical quantity that varies with time, and so the ideas you have already met in Section 2.5 about converting analogue weights to digital form are relevant here too. In particular, samples of the sound will have to be taken, and each sample will have to be quantised to the nearest binary code in the digital representation.
It's important to appreciate that sound such as speech or music varies rapidly with time, and so samples of it will have to be taken at very closely spaced intervals if the digital representation is to be faithful to the original.
Before I can talk about how closely the samples must be spaced, I need to introduce the idea of the frequency of sound. A sound of high frequency is one that people hear as a high-pitched sound; a sound of low frequency is one that people hear as one of low-pitched sound. Sound consists of air vibrations, and it is the rate at which the air vibrates that determines the frequency: a higher vibration rate is a higher frequency. So if the air vibrates at, say, 100 cycles per second then the frequency of the sound is said to be 100 cycles per second. The unit of 1 cycle per second is given the name ‘hertz’, abbreviated to ‘Hz’. Hence a frequency of 100 cycles per second is normally referred to as a frequency of 100 Hz.
So how often must the sound be sampled? There is a rule called the sampling theorem which says that if the frequencies in the sound range from 0 to B Hz then, for a faithful representation, the sound must be sampled at a rate greater than 2B samples per second.
The human ear can detect frequencies in music up to around 20 kHz (that is, 20 000 Hz). What sampling rate is needed for a faithful digital representation of music? What is the time interval between successive samples?
20 kHz is 20 000 Hz, and so the B in the text above the question is 20 000. The sampling theorem therefore says that the music must be sampled more than 2 × 20 000 samples per second, which is more than 40 000 samples per second.
If 40 000 samples are being taken each second, they must be 1/40 000 seconds apart. This is 0.000025 seconds, which is 0.025 milliseconds (thousandths of a second) or 25 microseconds (millionths of a second).
The answer to Example 5 shows the demands made on a computer if music is to be faithfully represented. Samples of the music must be taken at intervals of less than 25 microseconds. And each of those samples must be stored by the computer.
If speech is to be represented then the demands can be less stringent, first because the frequency range of the human voice is smaller than that of music (up to only about 12 kHz) and second because speech is recognisable even when its frequency range is quite severely restricted. (For example, some digital telephone systems sample at only 8000 samples per second, thereby cutting out most of the higher-frequency components of the human voice, yet we can make sense of what the speaker on the other end of the phone says, and even recognise their voice.)
Activity 21 (Self assessment)
Five minutes of music is sampled at 40 000 samples per second, and each sample is encoded into 16 bits (2 bytes). How big will the resulting music file be?
Five minutes of speech is sampled at 8000 samples per second, and each sample is encoded into 16 bits (2 bytes). How big will the resulting speech file be?
5 minutes = 300 seconds. So there are 300 × 40 000 samples. Each sample occupies 2 bytes, making a file size of 300 × 40 000 × 2 bytes, which is 24 000 000 bytes – some 24 megabytes!
A sampling rate of 8000 per second will generate a fifth as many samples as a rate of 40 000 per second. So the speech file will ‘only’ be 4 800 000 bytes.
You answer to Activity 21 has probably convinced you that speech and, especially, music files are not the sort of thing you wish to send as an email attachment! Fortunately there is a compression technique that can be used for sound files. It is known as MP3, which is short for ‘MPEG-1 Audio layer 3’, indicating that it is a compression technique defined in the first version of the MPEG standard. Using MP3, compression ratios up to about 12 can be achieved without any noticeable degradation of the sound quality. Higher compression ratios can be achieved if some loss of quality can be tolerated – as much as 100 if telephone-quality speech is acceptable.