9 Representing audio in binary
Just as we see the world smoothly, not in steps, so too we hear the world smoothly. However, to represent sound in binary we apply the same technique and break sound up into tiny units we can represent in binary.
If you shout, hit a piano key or drop a plate, then you set particles of air vibrating – and any ears in the vicinity will interpret this tremor as sound. The sounds we hear generally consist of small rapid movements (changes, fluctuations) of the atmospheric air pressure that surrounds us. Sound can also be transmitted through other media, such as water, so not all sound consists of fluctuations in air pressure. However, this course will only consider sound in air.
A microphone is used to convert the changes in atmospheric pressure wave into an electrical signal with a voltage that varies in accordance with the pressure of the original signal. This electrical signal needs to be converted into a digital representation. However, there is no audio equivalent of a ‘pixel’ rather we talk of sampling rate. Associated with each sample is its depth (how many bits are used to record the data at that sample). For example, CD audio samples are taken 44 100 times a second (which you may see as 44.1 kHz), and each sample uses 16 bits for each stereo channel. Hence you will the quality of a recording determined by its sampling rate sampling and depth.
As with image files, audio files can quickly become very large as the sampling rate and depth increase. Similarly, there are many standards to encode and compress audio files to make them more manageable, such as MP3 (the third audio standard from the Moving Pictures Expert Group) and flac (an open source format from Xiph.Org Foundation), as well as many proprietary formats such as Microsoft’s WMA (Windows Media Audio).