Introduction

TM355_1 Exploring communications technology Exploring communications technology About this free course This free course is an adapted extract from the Open University course TM355 Communications technology www.openuniversity.edu/courses/modules/tm355. This version of the content may include video, images and interactive content that may not be optimised for your device. You can experience this free course as it was originally designed on OpenLearn, the home of free learning from The Open University – http://www.open.edu/openlearn/science-maths-technology/exploring-communications-technology/content-section-0 There you’ll also be able to track your progress via your activity record, which you can use to demonstrate your learning.

Unless otherwise stated, copyright © 2017 The Open University, all rights reserved. Intellectual property Unless otherwise stated, this resource is released under the terms of the Creative Commons Licence v4.0 http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_GB. Within that The Open University interprets this licence in the following way: www.open.edu/openlearn/about-openlearn/frequently-asked-questions-on-openlearn. Copyright and rights falling outside the terms of the Creative Commons Licence are retained or controlled by The Open University. Please read the full text before using any of the content. We believe the primary barrier to accessing high-quality educational experiences is cost, which is why we aim to publish as much free content as possible under an open licence. If it proves difficult to release content under our preferred Creative Commons licence (e.g. because we can’t afford or gain the clearances or find suitable alternatives), we will still release the materials for free under a personal end-user licence. This is because the learning experience will always be the same high quality offering and that should always be seen as positive – even if at times the licensing is different to Creative Commons. When using the content you must attribute us (The Open University) (the OU) and any identified author in accordance with the terms of the Creative Commons Licence. The Acknowledgements section is used to list, amongst other things, third party (Proprietary), licensed content which is not subject to Creative Commons licensing. Proprietary content must be used (retained) intact and in context to the content at all times. The Acknowledgements section is also used to bring to your attention any other Special Restrictions which may apply to the content. For example there may be times when the Creative Commons Non-Commercial Sharealike licence does not apply to any of the content even if owned by us (The Open University). In these instances, unless stated otherwise, the content may be used for personal and non-commercial use. We have also identified as Proprietary other material included in the content which is not subject to Creative Commons Licence. These are OU logos, trading names and may extend to certain photographic and video images and sound recordings and any other material as may be brought to your attention. Unauthorised use of any of the content may constitute a breach of the terms and conditions and/or intellectual property laws. We reserve the right to alter, amend or bring to an end any terms and conditions provided here without notice. All rights falling outside the terms of the Creative Commons licence are retained or controlled by The Open University. Head of Intellectual Property, The Open University WEB 05036 8 1.1 Communications Introduction Modern communication technology amalgamates many areas of knowledge such as electronics, radio-frequency engineering, information theory, cryptography, and signal processing. Nevertheless, some basic principles and recurring themes underpin much communications technology, and this free course, Exploring communications technology, concentrates on them. The first section focuses on digital modulation and on the widely used technique of Quadrature Amplitude Modulation (QAM). The second section looks at error control, and the third at data compression. The fourth and final section looks at the way orthogonal frequency division multiplexing (OFDM) underpins fourth generation mobile communications (4G), wi-fi and broadband. This OpenLearn course is an adapted extract from the Open University course TM355 Communications technology. Learning outcomes After studying this course, you should be able to: demonstrate an understanding of the principles of, and reasons for, carrier-wave modulation, and its application to Quadrature Amplitude Modulation demonstrate an understanding of the roles of error detection and error correction, and be able to perform calculations involving check digits and coding rates demonstrate an understanding of the basic techniques of perceptual coding and the reasons for their use demonstrate an understanding of how the use of multiple OFDM subchannels enables efficient use of communications channels in mobile communications, broadband and wifi. 1 Signals and modulation Modern communication, whether by smart phone, computer networks, or broadcast TV and radio, presents many challenges. There is generally a demand for faster broadband, faster mobile data, higher definition TV and video, etc. In the terminology of the subject, there is a demand for higher capacity links. Users of a shared medium (such as mobile communications) must somehow be kept separate from each other. Errors in transmission, which are unavoidable, should be minimised. Coverage can be limited, particularly with mobile devices. Solutions to these problems can conflict with each other. For example, increasing the power of a broadcast transmitter improves coverage and reduces the possibility of errors, but might adversely affect other services in the area. In practice, therefore, compromises usually have to be found, which is typical of the engineering approach to problem-solving. This OpenLearn course looks at some of the theoretical background to the technology of modern communications. It is an adapted extract from the Open University course TM355, Communications Technology. In this introductory audio, Allan Jones talks to Adrian Poulton and Helen Donelan, the authors of Section 1, about issues related to this topic. ALLAN JONES Hello. I’m Allan Jones, and I helped to put together this short course in communications. Throughout the course, I'll be talking to people who’ve contributed in one way or another. The first contributor is Helen Donelan, who was the main author of the material in Section 1. Hello, Helen. HELEN DONELAN Hello. ALLAN JONES A fundamental concept in telecommunications is the sine wave and also the cosine wave. What are these? HELEN DONELAN Oh, well, sine and cosine waves are really the same as each other. They’re a type of smooth, repeating waveform. And they will generally represent the way many natural phenomena vary in time and space. They have this characteristic smooth undulating shape which repeats itself. You can turn them into sound. And if you do, they have a very characteristic sound that people often describe as pure. This is an example. [BEEP] This is a bigger one. [LOUDER BEEP] This is one where the shape is repeating slowly. [LOWER-PITCHED BEEP] Finally, this is one that repeats very rapidly. [HIGH-PITCHED BEEP] The light from a very pure source, like a laser, would be sinusoidal. And by that, we mean it has the shape of a sine or a cosine wave, if you could see its variations. Radio waves have the same shape prior to modulation. Sine waves also are related to rotation. So the voltage from a rotating generator of electricity fluctuates with the shape of a sine wave. ALLAN JONES Why are these waves so important in the context of communications? HELEN DONELAN Well, at the basic level, signals can be discussed in terms of the properties of sine waves, the three main properties being frequency, amplitude, and phase. So signs and cosines are the basis of a lot of the vocabulary that we use. But the importance of sine waves does go beyond that. You can combine sine waves to create almost all common periodic wave shapes, for example, the square wave, which can be used to represent digital data. So it can be helpful to think of complicated waves, such as a square wave, in terms of the sine waves that you’ve combined to make the wave, so adding different sine waves to make that more complicated wave. So for example, any wave with sharp corners or sudden transitions, going back to the square wave again, contains a lot of high frequency sine waves. We know that because when you synthesise that sort of wave from sine waves, you find it’s high frequency sine waves that give you the sharp corners and the sudden transitions. ALLAN JONES OK, so maybe that answer has a bearing on my next question. In this subject, people often use the terms time domain and frequency domain. For instance, they might talk about the time domain representation of a signal. And this might be contrasted with its frequency domain representation. What do these terms mean? HELEN DONELAN Well, the time domain representation is the one that we’re probably most familiar with. So if you think back to the square wave again, maybe representing alternating ones and zeros in digital data, then the wave is that one for a period of time. Then it drops to a zero, and then it goes back to a one, and so on. And this is what we mean by the time domain description of a signal. What you can do is think of the same square wave in terms of the sine waves and their frequencies that you would need to add together in order to synthesise it. So those sine waves and the proportions in which they are combined will be the frequency domain representation of a square wave. One representation is sometimes more useful than the other. So for example, if you phone a bank or another business, often the first thing you get is a series of recorded questions where you have to answer yes or no or maybe say your date of birth. At the other end, there's a speech recognition system. What this is usually doing is analysing your response in the frequency domain. And it does this by looking at how the power in your speech is distributed in different frequency bands. This is how it works out what syllables it is you are saying. ALLAN JONES Thanks.

1.1 Periodic signals Fundamental to communications is the analogue signal known as the sinusoid or sine wave, shown in Figure 1. Sinusoids are important not only because they turn up naturally in a wide variety of situations, but also for their mathematical simplicity. Figure 1 shows a sinusoidal changing voltage, but other properties can change in this way, such as current, power, pressure, and so on.

A sinusoid is an example of a periodic signal. It repeats at regular time intervals. Any non-sinusoidal periodic signal can be regarded as a sum of sinusoids. A section of a periodic signal between two consecutive maxima (or any other corresponding points) is called a cycle. The duration of a cycle is the period. The number of cycles in one second is the frequency. The unit of frequency is the hertz (Hz), where 1 Hz = 1 cycle per second. If f is the frequency in Hz and T is the period in seconds, then:

f = \frac{1}{T}

and

T = \frac{1}{f}

Also shown in Figure 1 is the amplitude, the maximum value of the sinusoid. Activity 1 Self assessment A sine wave has a frequency of 25 000 Hz (25 kHz). What is its period? T =

\frac{1}{f}

, so T =

\frac{1}{25000}

s = 0.00004 s or 40 µs. Sinusoidal signals also have phase. This relates to the part of a cycle that the sinusoid has reached at a particular time. In Figure 1, for example, at zero time the signal is zero and rising. Shifting the signal to the right or left changes its phase. Phase is measured in degrees or radians, and ranges from 0° to 360° (0 to 2π radians). Activity 2 Exploratory This activity demonstrates how a sinusoid can be generated, by measuring the height of a rotating line. It will allow you to explore the different features of the sinusoidal waveforms you have just been reading about, so that you can become more familiar with how sinusoids can be created and what happens when the different properties of sinusoids are altered. The activity has been pre-loaded with the required settings to create a basic sine wave. When you click on ‘Generate’, you will see how a sine wave can be created by rotating a line of length a at a constant speed about a fixed point O. As the line rotates, you can see how the variation of the line marked y plotted against time traces out the shape of a sine wave. The frequency, amplitude and phase of the sinewave can be changed using the sliders at the top. Changes can only be made when the rotating line is stopped. There are two main boxes illustrated. The box at the top is smaller and contains a figure with an X axis and a Y axis. The X axis extends from minus ten to plus ten and is labelled t (for time) with units of milliseconds. The Y axis extends from minus one to plus one and is labelled V (for voltage) with units of volts. Plotted on these axes is a sine wave. On first opening the activity, there are four full cycles of the sine wave illustrated and the amplitude is 0.5 volts. To the right of the box containing this figure there are three sliding scales. The first sliding scale is labelled ‘Frequency’ and on opening the activity the value is displayed as 200 hertz. The slider enables values from 100 hertz up to 1000 hertz to be selected, in increments of 10.The second sliding scale is labelled ‘Amplitude’ and on opening the activity the value is displayed as 0.50. The slider enables values from 0.10 up to 1.00 to be selected, in increments of 0.01.The third sliding scale is labelled ‘Phase’ and on opening the activity the value is displayed as 0 (and, below this in brackets, ‘0 degrees’). The slider enables values from minus two pi (or minus 360 degrees) up to plus two pi (or plus 360 degrees) to be selected, in increments of pi over eight (or 22.5 degrees).The larger box is positioned below the first box and the sliding scales. This comprises two elements. On the right-hand side there is a figure comprising an X axis extending from 0 to just over 10, labelled t (for time) with units of milliseconds, and a Y axis extending from minus one to plus one, labelled V (for voltage) with units of volts. On opening the activity, there is nothing plotted on this figure. To the left of these axes there is a dotted circle. There is a horizontal dotted line through the centre of the circle that is aligned with the X axis in the figure to its right. The centre of the circle is labelled zero and the radius of the circle is equal to 0.5.Below this box there are two buttons, one labelled ‘Generate’ and one labelled ‘Reset’. Clicking on the ‘Generate’ button starts an animation in the bottom box. Once the ‘Generate’ button has been clicked, the name changes to ‘Stop’; if this is clicked again, it pauses the animation.Without changing any of the initial sliding scales at first, clicking on ‘Generate’ has two effects. Firstly, on the circle, a horizontal line appears from the centre of the circle to the right-hand edge of the circle (that is, the length of the line is the radius of the circle). This line rotates in an anti-clockwise direction about the centre of the circle. The length of the line is labelled ‘a’. The distance between the endpoint of this line and the dotted line across the centre of the circle (which changes as the line rotates) is labelled ‘y’. The second effect is that as the line rotates, the point on the outer edge of the circle maps onto the axes to the right of the circle, and traces out a waveform. The two are linked by a dotted line to show how the variation of the line marked ‘y’ on the circle traces out a waveform on the axes that is the shape of a sine wave. This continues until just over two and a half cycles of the sine wave have been traced out, and then the line stops. The sliding scales can be used to change the value of the properties of the sine wave illustrated in the top box. Once the properties have been changed, pressing ‘Generate’ will start the animation again but with the new values set by the sliding scales. Pressing the ‘Reset’ button at any time will return to the original settings (frequency to 200 hertz, amplitude to 0.5 and phase to zero) and remove the rotating line and traced waveform from the bottom box.Changing the value of the frequency using the appropriate sliding scale has the following effects: In the top box, increasing the frequency will have the effect of showing more cycles of the waveform. Likewise, decreasing the frequency means that fewer cycles will be shown. In the bottom box, increasing the frequency means that the rotating line in the circle rotates quicker. This also means that more cycles are traced out on the axes to the right of the circle as the line completes more rotations within the time frame indicated on the figure. Likewise, lowering the frequency means the line rotates more slowly and fewer cycles are illustrated on the axes. At a maximum frequency of 1000 hertz, ten cycles are illustrated within 10 milliseconds. At a minimum frequency of 100 hertz, one cycle is illustrated within 10 milliseconds. Changing the value of the amplitude using the appropriate sliding scale has the following effects: In the top box, increasing the amplitude will increase the peak value of the waveform illustrated. Likewise, decreasing the amplitude will decrease the peak value of the waveform. In the bottom box, increasing the amplitude means that the radius of the circle – and therefore the length of the rotating line in the circle, ‘a’ – increases. This also means that the peak value of the waveform that is traced out increases. Likewise, decreasing the amplitude means that the radius of the circle, and therefore the length of the rotating line in the circle (a) decreases. At a maximum amplitude of one, the length of the rotating line is one and the peak value of the sine wave that is traced out is one. Changing the value of the phase using the appropriate sliding scale has the following effects: In the top box, increasing or decreasing the phase will shift the waveform with respect to time. A positive phase value shifts the waveform to the left (an advance in time relative to a phase of zero). A negative phase value shifts the waveform to the right (a delay in time relative to a phase of zero). In the bottom box, a change in the value of phase means that the rotating line starts from a different position. For example, for a value of pi over two (or 90 degrees), the line starts in an upright (or vertical) position, so extends from the centre to the top point of the dotted circle. For a value of minus pi over two (or minus 90 degrees), the line starts in a position that extends from the centre to the bottom point of the dotted circle. The effect this has on the waveform that is traced out is that the waveform also has a different starting point. Again, for an example of plus pi over two, where the rotating line starts at the top, and y is at a maximum value, the waveform also starts at its maximum value. At a value of two pi (or 360 degrees), the line has in effect rotated around the whole circle and the result, both in terms of the rotating line and the waveform that is traced out, is the same as when the phase is zero.

1.2 Non-periodic signals Communications is all about transferring information. A periodic signal, though, has limited possibilities for conveying information because of its predictability. After receiving a few cycles and establishing what the pattern is, we know the cycles that follow will be exactly the same. The signal may convey important information when it begins, as in the case of a fire alarm, where there is a call to immediate action as soon as the sound is heard. But a signal that never varies in amplitude, frequency, phase or any other aspect conveys little if any further information to a recipient. So in practical communications, exactly periodic signals are the exception. Signals that carry real information, such as speech, music or video, do not repeat endlessly. Non-periodic signals (also known as aperiodic signals), unlike periodic signals, do not have just one particular frequency. Instead, they are spread out over a continuous range of frequencies. For example, a speech signal ranges from around 100 Hz to a few thousand Hz (for telephone-quality speech, a range of 300 Hz to 3400 Hz is often assumed).

1.3 Digital signals and modulation Radio waves are naturally sinusoidal, with frequencies covering a wide range. They are capable of travelling through space, and are widely used for communication. This is a brief explanation of how they are able to carry information. Many of the same principles apply to other communication media, such as optical signals and electric currents. Activity 3 Exploratory Radio waves cover a wide range of frequencies, some of which are more suitable than others for a particular service. You can explore some uses of radio with this interactive chart. Click on the image of the electromagnetic spectrum below to learn more about the highlighted part of the spectrum (radio and microwave frequencies). You will see that this part of the spectrum is conventionally divided into bands, each covering a decade in frequency (or wavelength). Make a note of the frequencies and wavelengths and the typical uses of each band. [The radio and microwave frequencies interactive will open in a new window. After you have viewed the interactive, click on the link 1.3 Digital signals and modulation, to return to this page.] This interactive chart presents information in the form of a slider that allows you to select different spectrum bands within the radio and microwave sections of the electromagnetic spectrum (wavelength greater than 300 micrometres, frequency less than 10 to the power 12 hertz). When a band is selected, the following information is displayed: frequency range, wavelength, representative image, uses and method of propagation. This information is presented in the table below.

Spectrum band	Frequency	Wavelength	Representative image	Uses	Method of propagation
Very low frequency (VLF)	3 kilohertz to 30 kilohertz	10 kilometres to 100 kilometres	Submarine	This band is not used much for communications in general, because of the difficulties in building efficient antennas for the long wavelengths involved (10 to 100 kilometres). Radio waves at these frequencies, or lower, have the ability to penetrate sea water to an extent, so this band is used for submarine communications.The VLF band has been used for global navigation systems with ground-based transmitters.	Surface wave.
Low frequency (LF)	30 kilohertz to 300 kilohertz	1 kilometre to 10 kilometres	Radio	This band is involved in long-wave radio broadcasting using amplitude modulation (AM). Surface wave propagation is effective at these frequencies, allowing coverage of a large region with one high-power transmitter. The bandwidth available at LF is limited, though (only 270 kilohertz altogether). Also, a large land area is needed for an effective broadcast transmitting antenna at these long wavelengths. Navigation systems using ground-based transmitters operate in the LF band. They provide a backup to satellite-based systems such as GPS, which operates in the UHF band.The LF band is used in some territories for broadcasting time signals.	Mainly surface wave.
Medium frequency (MF)	300 kilohertz to 3 megahertz	100 metres to 1 kilometre	Portable radio, radio mast	This band is involved in medium-wave radio broadcasting using amplitude modulation (AM). The reception range is increased at night time by sky-wave propagation; while this allows more stations to be received, it can also result in interference from unwanted transmissions. The transmitting antenna may be a quarter-wavelength monopole, as shown here. (A similar design for LF would need a much taller mast.)	Surface wave. Sky wave at night time.
High frequency (HF)	3 megahertz to 30 megahertz	10 metres to 100 metres	Amateur radio	This band is used for amateur radio; global distances can be covered by sky-wave propagation, often with modest transmitter power. Propagation at HF is highly dependent on the state of the ionosphere. Nonetheless, the HF band is used for short-wave radio broadcasting, and also for a variety of government, military, aviation and maritime purposes.	Sky wave.
Very high frequency (VHF)	30 megahertz to 300 megahertz	1 metre to 10 metres	Digital radio, aeroplane, taxi	The frequency range of VHF allows wider-bandwidth services to be accommodated than is possible at HF, MF or LF. The shorter wavelength of VHF compared to HF also means that antennas are more compact. These features make the VHF band attractive for a wide variety of services.FM radio and DAB radio, which require higher bandwidths than the AM transmissions at MF and LF, are situated in this band. The band is also used for aviation, and for some private mobile radio (e.g. taxi services).	Line of sight, together with contributions from reflected, diffracted and scattered waves.
Ultra high frequency (UHF)	300 megahertz to 3 gigahertz	100 millimetres to 1 metre	Mobile phone, microwave oven, TV aerial	This band affords more spectrum bandwidth than VHF, so terrestrial TV broadcasting in the UK is done at UHF (though VHF was used in the days before UHF receivers were widely available, and fewer channels were broadcast). Some (not all) channels of 8 megahertz bandwidth in the range 470 to 790 megahertz are used, each channel carrying a digital multiplex.The relatively short wavelengths at UHF mean that antennas can be compact enough to fit into small mobile devices. The propagation characteristics of this band are favourable for short- to medium-range links, and the technology is mature. So it is not surprising that a huge variety of applications are found in this band, including mobile phones (at 800 megahertz, 900 megahertz, 1.8 gigahertz, 2.1 gigahertz and 2.6 gigahertz), WiFi (at 2.4 gigahertz), Bluetooth, GPS, cordless phones and baby monitors. Some radar applications also operate here. Microwave ovens work at a frequency of 2.45 gigahertz, which is within this band.	Line of sight, together with contributions from reflected, diffracted and scattered waves.
Super high frequency (SHF)	3 gigahertz to 30 gigahertz	10 millimetres to 100 millimetres	Satellite dish, WiFi access point	Satellite broadcasting operates in this band, where it is able to take advantage of the wider bandwidths that are available for allocation, compared to UHF and lower bands. Atmospheric absorption is relatively low compared to that in the next frequency band (EHF).Congestion in the UHF spectrum has also encouraged a move to higher frequencies for wireless networking, in particular the 5 gigahertz WiFi standard. The SHF band is used by many radar applications.	Line of sight, together with contributions from reflected, diffracted and scattered waves.
Extremely high frequency (EHF)	30 gigahertz to 300 gigahertz	1 millimetres to 10 millimetres	Large dish antenna, satellite	This band is important for specialised uses such as radio astronomy and satellite exploration.In the past the EHF band was not used much for general communications, owing to the technical challenges of transmitter and receiver design at these frequencies. However, this is changing, with 60 gigahertz proposed as a WiFi band for short-range data communication. This particular frequency is not suitable for long-range terrestrial communication because of a peak in atmospheric absorption.	Line of sight. Atmospheric absorption is strong, particularly at 60 gigahertz and 200 gigahertz.

Generally a medium used for communication (such as radio waves) needs to be processed in some way to carry information. The process is called modulation. Two signals are combined in modulation: The message signal, called the modulating signal. (Often this is non-periodic.) A signal of the right frequency for transmission, called the carrier signal. When they are combined, the modulating signal changes the carrier signal in some way, such as by changing its amplitude or frequency. This creates a new signal that contains the message information and is also at the correct transmission frequency. Note that although modulation of some kind is essential for wireless transmission, it is also used in much wired transmission, for example broadband and optical fibre. In the next section, assume that the message to be sent is in the form of a digital signal (that is, a signal that is interpreted as a sequence of discrete values). In fact, most communications fall into this category; computer networks and almost all telephony, as well as digital TV and radio. Analogue signals such as speech are converted to digital form at one end of a communications link and back to analogue at the other. When the message signal is digital, modulation produces distinct states of the carrier wave that can be distinguished by the receiver and can be used to represent ones and zeros, or groups of ones and zeros. Next you will see some basic digital modulation schemes.

1.4 Amplitude-shift keying (ASK) In ASK, only the amplitude of the carrier signal is modified in modulation. The simplest version is on–off keying (OOK). In OOK, either bursts of a carrier wave are transmitted or nothing is transmitted depending whether the input message signal is 1 or 0. Other versions of ASK use differing (non-zero) amplitudes to represent 1 and 0. Figure 2(a) shows a digital message signal using two voltage levels. One level represents 1 and the other represents 0. The unmodulated carrier is illustrated in Figure 2(b). Figure 2(c) and (d) are the modulated waveforms using two versions of ASK. Figure 2(c) uses OOK, and 2(d) uses binary ASK, or BASK.

In OOK and BASK, the modulated carrier can take one of two different states: one state representing a 0, the other a 1. These different carrier states are what are known as symbols. If there are more than two possible carrier states – that is, more than two symbols available – then it is possible for each symbol to represent more than one bit. Figure 3 shows ASK with four possible amplitude levels, or four symbols. With four symbols available, each symbol can be uniquely represented with a two-bit binary number. This is because there are just four possible two-bit binary numbers: 11, 10, 01 and 00.

If there were eight symbols, each could represent three data bits. The relationship between the number of available symbols, M, and the number of bits that can be represented by a symbol, n, is: M = 2ⁿ The term baud refers to the number of symbols per second, where one baud is one symbol per second. Data rate (or bit rate) and baud are closely related. Activity 4 Self assessment If a communications system uses 16 symbols, how many bits does each symbol represent? If the same system has a symbol rate of 10 000 baud, what is the data rate? If there are 16 symbols, then each of these can represent 4 bits, because 16 = 2⁴. There are 10 000 symbols per second, and each symbol represents 4 bits, so the number of bits per second is 4 × 10 000 = 40 000. So the data rate (or bit rate) is 40 000 bit s⁻¹, also written 40 kbit s^-1. Increasing the number of bits a symbol can represent means that higher data rates can be achieved.

Standard	Year	Band/GHz	Channel(s) /MHz	Modulation method	Highest modulation order	Highest code rate	MIMO streams	Max transmission rate per MIMO stream /Mb s^–1
802.11a	1999	2.4	22	OFDM	64 QAM	3/4	N/A	54
802.11b	1999	2.4	22	CCK	QPSK	1/2	N/A	11
802.11g	2003	2.4	20	OFDM	64 QAM	3/4	N/A	54
802.11n	2009	2.4/5	20/40	OFDM	64 QAM	5/6	up to 4 streams total	65(in 20 MHz)
802.11ad	2012	60	2160	OFDM	64 QAM	13/16	N/A	6756.75
802.11ac	2014	5	20/40/ 80/160	OFDM	256 QAM	5/6	up to 8 streams	180 (in 40 MHz; highest coding rate in 20 MHz = 3/4)

1.5 Frequency-shift keying (FSK) In FSK, the frequency of the carrier signal is modified. An illustration of binary FSK, or BFSK, is given in Figure 4. Here, bursts of a carrier wave at one frequency or bursts of a carrier wave at a second frequency are transmitted according to whether the input data is 1 or 0.
Figure 4 Binary FSK This diagram shows a line graph. The X or horizontal axis is labelled as time, measured in seconds. The Y or vertical axis is labelled as voltage, measured in volts. Time periods are represented on the diagram by a series of equidistant dotted vertical lines. Each time period is labelled with a 1 or a 0, in the sequence 1 0 0 0 1 1 0 1 0 0 from left to right.Here a sinusoidal signal of amplitude 1 volt is shown varying between a relatively high frequency (three cycles per time period) and a relatively low frequency (one and a half cycles per time period). Periods in which the frequency is high correspond to a 1, and periods in which the frequency is low correspond to a 0. Thus the overall pattern of the sinusoid is as follows: high frequency for one time period, low frequency for three time periods, high frequency for two time periods, low frequency for one time period, high frequency for one time period, low frequency for two time periods.

1.6 Phase-shift keying (PSK) The third fundamental digital modulation technique, and the most widely used in one form or another, is PSK. Its simplest form is Binary Phase-Shift Keying (BPSK). In BPSK, 0 and 1 are represented by segments of sinusoids that differ in their phase. At the receiver, distinguishing between the two segments is easier if their phases differ by as much as possible. In BPSK the phases are separated by half a cycle (equivalent to π radians or 180°). See Figure 5.
Figure 5 BPSK This diagram shows a line graph. The X or horizontal axis is labelled as time, measured in seconds. The Y or vertical axis is labelled as voltage, measured in volts. Time periods are represented on the diagram by a series of equidistant dotted vertical lines. Each time period is labelled with a 1 or a 0, in the sequence 1 0 0 0 1 1 0 1 0 0 from left to right.Here a sinusoidal signal of amplitude 1 volt is shown varying between two different phases, one in which the waveform rises to a peak at the beginning of the time period (the first phase), and one in which it descends to a trough at the beginning (the second phase). Periods in which the waveform has the first phase correspond to a 1, and periods in which the waveform has the second phase correspond to a 0. The sinusoid switches between the two phases as follows: first phase for one time period, second phase for three time periods, first phase for two time periods, second phase for one time period, first phase for one time period, second phase for two time periods.
A BPSK-modulated signal is less susceptible to certain kinds of noise than ASK. Activity 5 Self assessment Figure 6 shows three examples of digitally modulated waveforms. For each example, decide which modulation scheme has been used and, based on the figures you saw earlier, work out what binary data each of these represents.
Figure 6 Three digitally modulated waveforms This diagram shows three digitally modulated waveforms labelled A to C. Time periods are represented by a series of equidistant dotted vertical lines. The X or horizontal axes are labelled as time, measured in seconds. The Y or vertical axes are labelled as voltage, measured in volts. In part A, a sinusoidal signal of amplitude 1 volt is shown for three time periods, then the phase changes for one time period, before reverting back to the original phase for one time period. The sinusoid then changes phase again for three time periods.In part B, the sinusoidal signal has a small amplitude of about 0.5 volt for four time periods, then rises to an amplitude of 1 volt for two time periods. It then reverts to the smaller amplitude for one time period, and back to the higher amplitude for one time period.In part C, a sinusoidal signal of amplitude 1 volt has a relatively high frequency for three time periods. Then the frequency lowers for two time periods, before reverting back to the original frequency for one time period. The sinusoid then changes to the lower frequency for a single time period, then back to the higher frequency for a single time period.
Waveform (a) is an example of a BPSK-modulated waveform representing the data: 0 0 0 1 0 1 1 1. Waveform (b) is an example of a BASK-modulated waveform representing the data: 0 0 0 0 1 1 0 1. Waveform (c) is an example of a BFSK-modulated waveform representing the data: 1 1 1 0 0 1 0 1. Activity 6 Exploratory This interactive activity will allow you to explore the three binary digital modulation schemes: OOK, ASK, BFSK and BPSK. Start the activity by clicking on the image or ‘View’ link below. You will see that you are invited to ‘Create a binary data stream’. Enter a series of 0s and 1s, then click on ‘Submit’ to create a modulating waveform and use this to modulate a carrier using one of the modulation schemes. You can change the modulation scheme using the drop-down menu at the top left, and change the carrier frequency using the slider at the top right. Try creating different modulated waveforms. Initially a box is displayed with text at the top, and three separate blank spaces below.The first line of text reads ‘Create a binary data stream’. This is followed by an input box, with the greyed-out text ‘e.g. 1001011’, and next to this a button labelled ‘Submit’.Below this is the text ‘Select binary modulation scheme:’ and a drop-down menu with the following options: ‘OOK’, ‘ASK’, ‘BFSK’ and ‘BPSK’. This is followed by a sliding scale with the text ‘Carrier frequency:’, which is labelled ‘low’ at the left-hand side and ‘high’ at the right-hand side. The input box allows a stream of zeros and ones to be typed in. This can be any combination of zeros and ones but must be between one and ten characters in length. Once the binary stream has been input, the drop-down menu is used to select the required binary modulation scheme. Pressing the Submit button then generates three waveforms in the three blank spaces below. All three waveforms have an X axis labelled time, with no units or values indicated, and a Y axis labelled voltage (V).The first waveform is labelled ‘Data’ and illustrates the binary data stream as a series of voltage levels. The Y axis extends from 0 to 1 and the X axis is segmented into equally spaced sections, where the duration of each segment is associated with one binary data bit. Where the binary data is a ‘one’, a voltage level of ‘one’ is illustrated for the duration of the segment. Where the binary data is a ‘zero’, a voltage level of ‘zero’ is illustrated for the duration of the segment.The second waveform is labelled ‘Unmodulated carrier’. This is a sine wave and the Y axis extends from plus one to minus one. The segments that correspond to the duration of each of the binary data bits in the first waveform are also indicated on the X axis here. The frequency of the sine wave can be varied by using the sliding scale labelled ‘Carrier frequency’ described previously. With the sliding scale at its lowest value, two full cycles are completed within each bit duration. With the sliding scale at its highest value, ten full cycles are completed within each bit duration. The third waveform is labelled according to the binary modulation scheme that was selected using the drop-down menu. That is, either ‘On-off keying (OOK)’, ‘Amplitude shift keying (ASK)’, ‘Frequency shift keying (BFSK)’ or ‘Phase shift keying (BPSK)’. This waveform also has the segments that indicate the bit durations marked, and the Y axis extends from plus one to minus one.If OOK has been selected from the drop-down menu, then where the bit value illustrated in the first waveform is one, the third waveform looks the same as the unmodulated carrier. Where the bit value illustrated in the first waveform is zero, the third waveform is also zero. If ASK has been selected, then where the bit value illustrated in the first waveform is one, the third waveform looks the same as the unmodulated carrier. Where the bit value illustrated in the first waveform is zero, the third waveform looks like the unmodulated carrier in terms of phase and frequency, but the peak value is smaller – approximately 0.5. If BFSK has been selected, then where the bit value illustrated in the first waveform is one, the third waveform looks the same as the unmodulated carrier. Where the bit value illustrated in the first waveform is zero, the third waveform looks like the unmodulated carrier in terms of phase and amplitude, but the frequency of the waveform is lower – in fact, it is half the frequency of the carrier. If BPSK has been selected, then where the bit value illustrated in the first waveform is one, the third waveform looks the same as the unmodulated carrier. Where the bit value illustrated in the first waveform is zero, the third waveform looks like the unmodulated carrier in terms of frequency and amplitude, but the phase of the waveform is different – in fact, it is the unmodulated carrier with a phase shift of pi radians or 180 degrees.

1.7 Quadrature amplitude modulation (QAM) It is possible to combine ASK, FSK and PSK. One benefit of combining different modulation methods is to increase the number of symbols available. Increasing the number of available symbols is a standard way to increase the bit rate, because increasing the number of symbols increases the number of bits per symbol. It is rare for all three methods to be combined, but very common for ASK and PSK to be combined to create Quadrature amplitude modulation (QAM). QAM is based on the application of ASK and PSK to two sinusoidal waves of the same frequency but with a phase difference of 90°. Sinusoidal waves 90° apart are said to be in a quadrature phase relationship. It is customary to refer to one of these waves as the I wave, or in-phase wave or component, and the other as the Q wave, or quadrature wave or component (Figure 7).
Figure 7 (a) I (in-phase or sine) wave and (b) Q (quadrature or cosine) wave This figure consists of two line graphs, labelled A and B, showing sinusoids out of phase with each other. The X or horizontal axes are labelled as time. The Y or vertical axes are labelled as voltage. Graph A shows a sinusoidal waveform starting at the origin. It rises smoothly to a rounded peak. The line then drops smoothly down to a rounded trough the same distance below the X axis as the peak is above. Then it rises again to the next peak, which is the same height as the previous peak, and then drops again to a trough of the same depth. Several cycles are shown. The waveform is labelled I.Graph B shows the waveform starting above the origin, at maximum voltage. It falls smoothly to a rounded trough. The line then rises smoothly to a rounded peak the same distance above the X axis as the trough is below. Then it falls again to the next trough, which is the same depth as the previous trough. Several cycles are shown. The waveform is labelled Q.
You may recognise the I wave in Figure 7 as a sine function and the Q wave as a cosine function. These functions are said to be orthogonal to each other. If two signals are orthogonal, when they are transmitted simultaneously one can be completely recovered at the receiver without any interference from the other. The I and Q waves remain orthogonal if either or both of them are inverted (multiplied by –1, or flipped vertically). Negative amplitudes just mean that the wave is inverted. The set of symbols in QAM can be conveniently represented on a signal constellation diagram (Figure 8). This is a plot of the I and Q amplitudes with I on the horizontal axis and Q on the vertical axis. Each dot in Figure 8 is a symbol, as it represents a unique combination of amplitude and phase of the I and Q waves. So, in each symbol period, only one of the ‘dots’ is transmitted. As there are 16 symbols, this version of QAM is called 16-QAM.
Figure 8 Constellation diagram for 16-QAM This diagram shows a pair of axes. The X axis is labelled I and has points minus 3, minus 1, 1 and 3 marked on the axis. The Y axis is labelled Q and likewise has points minus 3, minus 1, 1 and 3 marked on the axis.There are 16 dots positioned between the axes in a square formation, four rows of four. There is a dot at the intersection of each of the points marked on the axes. Thus the dots are spaced at regular intervals, the same distance apart.
To understand what each dot in the diagram represents, take the top left one. This represents a symbol where the Q wave is at an amplitude of 3 and the I wave is at an amplitude of –3. The minus sign means the I wave is inverted (or phase shifted by 180°) relative to the I wave in Figure 7(a). As the number of symbols increases, more data bits are transmitted per symbol. For example, 64-QAM is a QAM scheme with 64 symbols, and 256-QAM is a scheme with 256 symbols. 256-QAM conveys 8 bits per symbol (as 256 = 2⁸), so achieving twice the data rate of 16-QAM for the same symbol rate. Activity 7 Self assessment How many bits are represented by each symbol in 64-QAM? Sketch a constellation diagram for 64-QAM. For 64-QAM, the number of symbols M = 64. There are six bits per symbol, as M = 2ⁿ and 64 = 2⁶. A constellation diagram for 64-QAM might look like this:
Constellation diagram for 64-QAM
The points on the diagram in the answer to Activity 7 are placed at values of +/−1, 3, 5 and 7. The actual amplitudes used in practice are likely to be different; but if the spacing between constellation points remains the same (2 in this case) and we keep adding more points in this way, then we are increasing the power in the signal. The further away from the origin a constellation point is, the more power is required in the signal. Alternatively, it could be necessary to keep the maximum signal power constant whether we are using 16-QAM or 64-QAM, for instance. This would mean packing the points closer together in 64-QAM than in 16-QAM. However, if the points are closer together then adjacent symbols will be more likely to be misinterpreted at the receiver as a neighbouring symbol. One of the effects of noise (which is unavoidable in communication) is to add a degree of uncertainty about which symbol has arrived at the receiver.

1.8 Bandwidth An important point to note with modulation schemes is that although the carrier signal is periodic, the resultant modulated signal is generally not periodic. (It would be periodic if the modulating signal were periodic, for example if it consisted of the repeating series 1, 0, 1, 0, etc.) Therefore, in frequency terms the modulated carrier wave occupies not just one frequency but a range of frequencies. The signal is said to extend over a certain bandwidth, measured in Hz. (Note that the word ‘bandwidth’ is also commonly used to mean the data rate of a digital signal, but here I am talking about analogue bandwidth.) The bandwidth of an ASK signal is approximately B_ASK = 2B, where B is the bandwidth of the modulating signal. This is also approximately true for PSK; B_PSK = 2B. The bandwidth of FSK depends how far apart the two frequencies used are. It is approximately: B_FSK = 2(Δf + B). where 2Δf is the frequency separation of the highest- and lowest-frequency symbols. It follows that any channel conveying useful information has to use a section of the available frequency spectrum, not just one point on it. For a shared medium such as radio, this means there are limits to the number of channels that can be used at the same time and in the same place. This is a fundamental limitation in practical communications. Spectrum use is thus a major resource allocation problem.

1.9 Summary Modulation is an essential part of digital communications. There are various schemes available and there are design compromises to be made between data rate, bandwidth, the likelihood of errors, complexity and so on. Modulation is used both in wireless and wired communication. In digital communication, the unit of data transmission is the symbol. A symbol might represent a single bit of data (such as 0 or 1) or several bits (such as 0000, or 0001) depending on how many symbols are used in the modulation scheme. Quadrature amplitude modulation (QAM) is a very widely used digital modulation system for providing multiple symbols.
2 Error control Communication channels always suffer from noise, and consequently errors are unavoidable in digital communication. For example, you saw in Section 1 how noise can lead to QAM constellation points being misinterpreted for neighbouring points. The more noise there is, the greater is the likelihood of this kind of misinterpretation. Although the likelihood of errors can be reduced, for example by transmitting a more powerful signal, it can never be eliminated. Strategies such as using more powerful signals can, in any case, lead to further problems, such as increased interference with other communication channels or additional running costs. When errors occur, they can sometimes be detected or corrected. This is error control. An everyday example of error control is the barcode, as in Figure 9. Figure 9 is known as an EAN-13 code. These codes are widely used as identifiers for articles such as books and other consumer items. The pattern of narrow and broad lines represent the 13-digit number shown below the lines. Scanners can read the pattern of lines and work out the number represented. Sometimes, though, the pattern is misread – perhaps because the code is unclear or damaged, or the scanner was not used properly. A misread code produces an incorrect number, which the scanning system recognises as incorrect. Usually a warning sound then indicates that re-scanning is needed. How does the system know that the number has been misread? I will look at this in more detail shortly, but in essence only certain EAN-13 codes are valid, and valid codes are greatly outnumbered by invalid codes. Although a valid code could conceivably be misread as a different but valid code (so that, for example, a box of biscuits might be mistaken for a book), it is much more likely that misreading code will produce an invalid code – just as a random assortment of letters is more likely to be nonsense than a valid word. The error control incorporated in EAN-13 codes is known as error detection. Another type of error control is error correction. Error correction not only allows you to know if there is an error in a code, but also corrects the error and recovers the intended data. Later on, we will look at Reed-Solomon coding, which is a widely-used method for error correction. In this audio Allan Jones talks to David Chapman, the author of Section 2, about some of these issues. ALLAN JONES Hello, again. I’m Allan Jones, and I'm talking now to David Chapman, who produced the material in Section 2 of this course. Hello, David. DAVID CHAPMAN Hello. ALLAN JONES With digital data, two important concepts relating to its representation and transmission are source coding and channel coding. First of all, what is source coding? DAVID CHAPMAN Source coding is about representing a source, such as text, audio, pictures or video, by patterns of binary bits. So with a text file, we need some way of representing the letters of the alphabet by patterns of bits. The so-called ASCII code is one standardised way of doing that, using seven binary bits for each character. For example, in ASCII code, the letter A is coded to 1100001, and the letter B is coded to 1100011. Anything we want to send over a digital communication system must first be coded like this, be it text, audio, or whatever. ALLAN JONES OK, so what’s channel coding. DAVID CHAPMAN Having got the source represented by bits, we might need to send those bits over a communication channel. But there are things you can do with the bits before turning them into waveforms on the channel, things that can make the communication work better. This is channel coding. It consists of taking bits or groups of bits and changing them into different groups of bits. So for example, the ASCII code for the letter A, which as I said before, is 1100001, might be changed by adding an extra digit, a one in this case, at the end. With the added digit, what we now call the code word, becomes 11000011. ALLAN JONES OK, so the net effect of that is you’ve turned a representation of A that used seven binary digits into one that uses eight. So what do you gain by doing that? DAVID CHAPMAN Well, what we gain is the possibility of error control. And in fact, you could say that the main job of channel coding is error control. Error control consists of either error-detection or error-correction. Error-detecting codes enable the receiver to work out if there’s been any errors in transmission. Error-correcting codes go further and enable the receiver to correct errors that have arisen in transmission. Adding the one bit to the ASCII code is an example of an error-detecting code known as a parity check. The added digit is chosen so as to ensure that the number of ones in the code word is even. So in this example, there were originally three ones in the ASCII code for the letter A, and that’s an odd number. Adding another one at the end increases the number of ones to four, which is an even number. With the parity check in place, it’s possible to tell if there’s been a single error in transmission, because a single error anywhere in the code word will result in the received code word having an odd number of ones. ALLAN JONES OK, you’ve mentioned error-correcting codes as another branch of error control. Error-correction enables the receiver itself to fix errors that arise in transmission without the need to request re-transmission. That sounds too good to be true. So how is it possible? DAVID CHAPMAN Actually, it’s quite easy to see one way to do error-correction. Just send every bit three times. So instead of one, you send three ones, 111. Instead of a zero, you send 000. So what’s transmitted consists entirely of groups of three ones and three zeros. If there’s an error in a group of three ones, the receiver will get two ones and a zero instead. Alternatively, if there’s an error in a group of three zeros, the receiver will get two zeros and a one instead. In either case, you can correct the error by taking a majority decision. A group of two zeros and a one was meant to be three zeros. A group of two ones and a zero was meant to be three ones. Something to notice here is that, once again, error control has required extra bits to be added, as with error-detection. ALLAN JONES So if you can correct errors, why bother trying to avoid them? DAVID CHAPMAN Error control, whether it’s error-detection or error-correction, has its limits. Error-correcting codes can't correct an unlimited number of errors. If you get too many errors in a period of time, the error-correcting properties of the code are defeated. That means some errors can still sneak through without being spotted by error-detecting codes. Furthermore, you pay a price for using error-control codes. You'll always need extra bits, such as the addition of the parity bit to the ASCII code that I mentioned before or the tripling of the number of bits by sending every bit three times to get error correction. In that sense, there’s a reduction in efficiency with error control. Advanced error-correcting codes aren’t as inefficient as the example of tripling the number of bits, but they do still require significant numbers of extra bits. ALLAN JONES Well, thanks, David.

2.1 EAN-13 code and error detection In the EAN-13 code, the first 12 digits of the number identify the item the code is attached to, and the final digit is a ‘check digit’.
Figure 9 An example of an EAN-13 barcode
The check digit for an EAN-13 code is calculated as follows: Count digit positions from the left to the right, starting at 1. Sum all the digits in odd positions. (In the example shown in Figure 9, this is 9 + 8 + 5 + 1 + 2 + 5 = 30 – note that the final 5 is not included since this is the check digit, which is what we are currently trying to calculate.) Sum all the digits in even positions and multiply the result by 3. (In the example, this is (7 + 0 + 2 + 4 + 5 + 7) × 3 = 75.) Add the results of step 2 and step 3, and take just the final digit (the ‘units’ digit) of the answer. This is equivalent to taking the answer modulo-10. (In the example, the sum is 30 + 75 = 105, so the units digit is 5.) If the answer to step 4 was 0, this is the check digit. Otherwise the check digit is given by ten minus the answer from step 4. (In the example, this is 10 – 5 = 5.) The check digit is appended to the right of the 12 identification digits. The check digit can have any value from 0 to 9. Activity 8 Self assessment The code below shows the first 12 digits of an EAN-13 code. Note: the hyphen between the 8 and the 0 has no bearing on the code. It is for convenience of reading, separating different elements of the identification. The 978 identifies this item as a book. (Not all EAN-13 codes have a hyphen in the same place.) Calculate the check digit, and so derive the full EAN-13 code. 978–014102662. Adding together the odd digits gives: 9 + 8 + 1 + 1 + 2 + 6 = 27. Adding together the even digits and multiplying by 3 gives: (7 + 0 + 4 + 0 + 6 + 2) × 3 = 19 × 3 = 57. Adding the two together gives: 27 + 57 = 84. The units digit is 4, so the check digit is given by: 10 − 4 = 6. The full EAN-13 code is therefore 978–0141026626. One way to check a received EAN-13 code for errors is to remove the received check digit and recalculate it based on the 12-digit identification code. If the recalculated value differs from the received value, there must be an error. Alternatively, there is a shortcut to checking for errors because of the way the check digit is derived. You take the full 13-digit received code and do steps 1 to 4 from the calculation used above. If the code is correct, the value at step 4 will be 0. If the code is wrong, it will have some other value. Activity 9 Self assessment Check whether the following codes are valid: 978–0521425575 978–1405322274 978–0521425575 is a 12-digit identification code 978–052142557 with a check digit of 5. Recalculating the check digit from the identification code gives 5, so the code is correct. Alternatively, using the ‘shortcut’, we take all 13 digits and go through steps 1 to 4.Adding together the odd digits gives:9 + 8 + 5 + 1 + 2 + 5 + 5 = 35.Adding together the even digits and multiplying by 3 gives: (7 + 0 + 2 + 4 + 5 + 7) × 3 = 25 × 3 = 75.Adding the two together gives:35 + 75 = 110.The units digit is 0, which shows that this is a valid EAN-13 code. 978–1405322274 is a 12-digit identification code 978–140532227 with a check digit of 4. Recalculating the check digit from the identification code gives 0, so the code is incorrect. Alternatively, using the ‘shortcut’, we take all 13 digits and go through steps 1 to 4. This gives 34 + 20 × 3 = 94, which results in a units digit of 4. Since this is not zero, this is not a valid EAN-13 code. One thing to notice about EAN-13 is that the numbers were treated as a string of separate digits, not as a single number. It was, for example, 9, 7, 8, 0, 5, 2, 1, 4, 2, 5, 5, 7, not 978 052 142 557 (i.e. not nine hundred and seventy-eight billion, fifty-two million, one hundred and forty-two thousand, five hundred and fifty-seven). In EAN-13 the digits are denary: numbers to base 10. In base 10 a digit can be any one of 10 symbols, which we represent as 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9. (The word ‘symbol’ is used here in a different, though parallel, sense from the symbols of modulation schemes.) The check digit included in the 13-digit EAN code is an example of redundancy. This is a standard term for bits appended to data for error control. Unfortunately the term ‘redundancy’ suggests that these additional bits serve no purpose, which is not true. They are redundant, though, in the sense that they are not part of the message data. All forms of error control involve the augmentation of a message with error control bits, which could be described as adding redundancy. If the number of bits (or bytes) in the message is k, and the augmented length is n bits (or bytes), the ratio k/n is known as the code rate. This is an important parameter. Code rate is a measure of how much redundancy has been added to the code. If lots of check digits (that is, lots of redundancy) are appended to a small number of message digits, the code rate will be small (much less than 1). If only a few check digits are appended to a big message, the code rate will be close to 1. (It can never exceed 1.) Code rates found in, for example, mobile communications and WiFi typically range from 1/4 to 5/6. Sometimes a code is specified by the numbers (n, k), in that order, with brackets around them and a comma between the numbers. Codes which take k message digits and create an n digit code word are described as (n, k) block codes. Activity 10 Self assessment Describe an EAN-13 code using the (n, k) notation What is the code rate of an EAN code? There are 12 digits in an EAN-13 code, to which one check digit is added, so it is a (13,12) code The code rate of an EAN-13 code is k/n = 12/13 Next, we will look at how Reed–Solomon codes are used for error control on data that is structured into sequences of eight-bit bytes.

2.2 Reed-Solomon codes and error correction Reed–Solomon codes (RS codes) are error-correcting codes invented in 1960 by Irving S. Reed and Gustave Solomon of the Massachusetts Institute of Technology (MIT). They have a wider range of application than EAN-13 codes. Reed-Solomon codes are used in compact disks, Blu-ray disks, DSL broadband, and in many other devices and media. RS codes are (n, k) block codes (as are EAN-13 codes), but, whereas EAN-13 codes operate on denary digits, the digits of RS codes are bytes. A byte is a group of eight bits, but in the context of RS codes, bytes are thought of as single entities. In the same way that there are ten different possible denary digits (the digits 0 – 9), there are 256 different possible bytes. They are the different possible combinations of eight bits: 00000000, 00000001, 00000010 ... 11111111. So, whereas n and k are the numbers of denary digits for EAN-13 codes (13 and 12 respectively), they are numbers of bytes for RS codes. (In fact the theory of RS codes is very general and can be used with other types of digits, but in practical applications they are used with bytes.) In RS codes, k bytes of message (referred to as the message digits) have appended to them additional bytes (the check digits) to create a codeword containing a total of n bytes. RS code words are much bigger than EAN-13 code words, and n can be up to 255 (255 bytes). Activity 11 (exploratory) How many bits are there in a block of 255 bytes? There are 8 x 255 = 2040 bits It is not always convenient to use the full block size, and RS codes can be shortened. Conceptually, the block is ‘padded’ with 0s, in other words, some of the message bits are replaced by 0s. Since the decoder knows they are 0s, they do not need to be sent, and so the block size actually used is reduced. As RS codes are error-correcting codes, the receiver can put right errors. So, if one of the bytes in the message was sent as 01100111, for example, but was received completely differently, such as 10000100, the RS decoder will be able to change it back to the correct value: 01100111. However, error correction can only be done if there are not too many errors. There are two steps to correcting errors in RS codes: Identify which of the digits (bytes) have errors Correct those digits. Both steps involve mathematical procedures. Unlike the procedures used with EAN-13 codes, the maths is done on bytes rather than denary digits, and the theory and methods are much more advanced than those used with EAN-13 for detecting errors. Let’s look at the ‘dimensions’ of the codes and how many errors can be corrected. RS codes can be designed with differing error-correction ability, depending upon the number of message digits in the block. For a given block size, the more message digits there are, the less redundancy there can be and therefore the fewer errors can be corrected. There will be more details on this trade-off shortly.

Reed-Solomon codes and error correction continued An interesting feature of RS codes is that if you already know the locations of the errors – that is, which bytes contain errors – more errors can be corrected than if you don’t know where the errors are. In effect, all the information provided by the check digits can be used in correcting the bytes, instead of some of it being needed to identify which bytes have errors. It might seem a strange idea that the location of the error could already be known, but this feature is exploited in some practical applications – one of which, Blu-ray discs, shall be explained in a little more detail shortly. This type of error, where the location is known, is called an erasure. Bytes that have errors where the location is not known in advance are just called symbol errors. (Confusingly, ‘symbol’ and ‘digit’ are used to mean the same thing in this context, and the symbols/digits are bytes.) In general, in each block of n digits an (n, k) RS code can: correct up to $\frac{n - k}{2}$ symbol errors correct up to n − k erasures. More generally, if there are both erasures and symbol errors to be corrected, the code can correct ν symbol errors and ρ erasures, where: 2ν + ρ ≤ n − k. Activity 12 (self-assessment) A popular base-256 (byte) RS code is the (255, 223) code. How many symbol errors can this code correct? How many bits is that? How many erasures can it correct? How many bits is that? What is the code rate and the redundancy of this code? For this code, n = 255 and k = 223. The number of symbol errors it can correct is given by: $\frac{n - k}{2} = \frac{255 - 223}{2} = 16$ So the code can correct up to 16 symbol errors (16 bytes) in each block of 255 bytes. In terms of bits, it can correct up to 16 × 8 = 128 bits in each block of 255 × 8 = 2040 bits. The number of erasures the code can correct is given by:n – k = 255 – 223 = 32.So the code can correct up to 32 erasures (32 bytes) in each block of 255 bytes. In terms of bits, it can correct up to 32 × 8 = 256 bits in each block of 255 × 8 = 2040 bits. The code rate is 223/255 = 0.87 (to 2 s.f.). The redundancy (to 2 s.f.) is $R = \frac{n - k}{n} = \frac{255 - 223}{255} = \frac{32}{255} = 0.13$ which is 13%. Activity 12 showed that RS codes can correct a large number of bits. These errored bits can all be adjacent in the received bit sequence, so they could form long bursts of errors. This ability to deal with long bursts of errors is a key feature of RS codes, which is the reason for their selection in many applications. Blu-ray discs, for example, use RS codes in a way that not only exploits their inherently good burst-error protection, but also extends it through the way data is written in the disc. Two RS codes are used, both working with bytes (base 256) and shortened from the (255, 223) code described in Activity 12, and they encode different kinds of data. One code, referred to as the long-distance code (LDC), protects user data – that is, the content of the disc intended for its user. This uses a (248, 216) RS code. The other code, called the burst-indicating subcode (BIS), encodes information needed for addressing and control within the disc. This is a (62, 30) RS code. User data on the disc is organised into 64 KB chunks or ‘clusters’ (see box below). After LDC encoding, the clusters are interleaved. Interleaving is a common way of combatting bursts of noise that could affect several consecutive data units (such as clusters in the present context, or frames in other contexts). Prior to transmission, the order of the clusters is shuffled. When the clusters are restored to their correct order by the receiver, any noise bursts should have their effect dispersed, so that affected clusters are distributed among unaffected clusters, rather than being consecutive. So, if there is a mark on the disc, instead of possibly obliterating an entire coded block of data, it should cause lesser damage to a number of blocks. This by itself increases the burst lengths that can be corrected, but the BIS helps further. Working with G, M, K in memory sizes The 64 KB clusters on Blu-ray discs contain 65 536 bytes, because in memory sizes the multipliers K, M and G mean the powers of 2 that fall closest to 10³, 10⁶ and 10⁹ respectively. Thus: K is 2¹⁰ = 1024 (i.e. close to 10³ = 1000) M is 2²⁰ = 1048 576 (i.e. close to 10⁶ = 1000 000) G is 2³⁰ = 1073 741 824 (i.e. close to 10⁹ = 1000 000 000). Notice the relationship between K, M and G: each is 2¹⁰ = 1024 times the previous one. So if you have a number expressed using the multiplier M, say 2 Mbits, you can express this using the multiplier K by multiplying by 1024. Thus 2 Mbits = 2048 Kbits. Similarly, you can multiply a number expressed using the multiplier G by 1024 in order to express it using the multiplier M. Loosely, we might say G = 1024M and M = 1024K, and G = M × K. Note that K has a different meaning from k, which always means 10³ rather than 2¹⁰. The burst-indicating subcode, as its name suggests, is involved in detecting error bursts. The BIS data is recorded on the disc at frequent and regular intervals, so that there is a short length of encoded user data (38 bytes) between single bytes of BIS data. This can be seen in Figure 10, which shows the structure of a 64 KB ECC (error-correcting code) cluster, including the so-called ‘picket codes’ of the BIS data. (This particular cluster has 496 rows and 155 columns. ‘Rows’ and ‘columns’ here refer to the arrangement of bytes on a Blu-ray disc, and are not directly connected with the RS codes.) If it is found that two or more consecutive bytes of BIS data have been corrupted, there is a fair chance that the LDC data between them will have been corrupted as well. This information is passed to the LDC decoder, which now treats the bytes in question as erasures, meaning it can correct twice as many of them compared to what it could have done if it had not known they were at fault.
Figure 10 Data structure on a Blu-ray disc This diagram shows a rectangle with short vertical sides and longer horizontal sides. It is divided into seven sections. Reading from left to right, these are labelled LDC, BIS, LDC, BIS, LDC, BIS, LDC. The LDC sections are white and the BIS sections are shaded blue.The first LDC section has a bracket above it, with the number 38. This represents 38 bytes of encoded user data. The first BIS section has a bracket above it, with the number 1. This represents a single byte of BIS data.The entire lower edge of the rectangle has a bracket below it, with the number 155. This represents the number of columns.The left vertical side of the rectangle has a bracket beside it, with the number 496. This represents the number of rows.

2.3 Summary In this part you have seen examples of the two broad categories of error control codes: an error detecting code (the EAN-13 code) and an error correcting code (RS code). All error control involved the addition of redundancy to a message. If the number of message bits (or bytes) is k, and the augmented length after the addition of redundancy is n bits (or bytes), the ratio k/n is the code rate.
3 Perceptual source-coding and lossy compression
3.1 Introduction Source coding is the representation of a phenomenon such as a sound or an image in a form suitable for communication or storage. You are probably familiar with the example of digital sampling for creating digital representation of sounds or images. Prior to transmission, such representations often have error control incorporated, as you saw earlier. There is more to source coding than just sampling, though, because of the desirability of representing a source as efficiently as possible. Usually this means using as few bits as possible, consistent with achieving a desired level of fidelity to the original source. This raises the issue of data compression, and in particular lossy compression which I will look at shortly. Lossy compression is widely used in the source coding of sound, images and video. MP3 and JPEG files use lossy compression. Usually this compression relies on human perceptual characteristics that enable some data to be discarded with no apparent degradation. In this introductory audio, Allan Jones talks to Laurence Dooley, the author of Section 3, about some of the issues from this section. ALLAN JONES Hello again. I'm Allan Jones, and I'm talking now to Laurence Dooley, who produced the material for section 3 of this course. Hello, Laurence. LAURENCE DOOLEY Hello. ALLAN JONES You're concerned in section 3 with lossy compression, which is a form of source coding. But there's another form of source coding which you don't deal with, called lossless compression. What's the difference between lossless compression and lossy compression? LAURENCE DOOLEY With lossless compression, the process is reversible. So you're guaranteed to get an exact version of the original source, so you get back what you had before you applied any compression, just as you would with, for instance, a zip file. The same is not true with lossy compression. Some details in the source are irredeemably lost during the compression process, and the reconstructed version will only be an approximation of the original source, albeit, often, it is interpreted by us humans as being exactly the same, like in MP3 or JPEG. ALLAN JONES Lossy compression sounds like the poor relation. It sounds as though using lossy compression is inferior to lossless compression because it discards information. LAURENCE DOOLEY Yes. It's true that lossy compression discards information, but I certainly wouldn't say that it's inferior. The trick with lossy compression is that what is lost or discarded are things that are not perceived. For multimedia type signals, the way we humans interpret sounds or images can be exploited in the compression process. For example, in audio coding, we can afford to lose sounds that we can't hear, while in image and video coding, we can discard things we're unable to see. So even though some detail is lost, hence the term "lossy coding," for us humans, there's no change in what we perceive, either audibly or visually. This approach to source compression is known as perceptual coding. This is distinct from the data coding you encountered in lossless compression, because rather than only focusing on exploiting data patterns or long strings of binary digits, we're interested in our perception of the source. ALLAN JONES OK. So why would you bother? Why not use lossless compression? LAURENCE DOOLEY Well, it's all to do data, bit rates, and file sizes. Lossy methods can compress files much more than lossless compression. They also offer much greater flexibility, as there's now an inherent trade-off existing between file size or transmission bit rate and the perceived quality. We can accept some reduction in the perceived quality in exchange for higher compression ratios or smaller file sizes. ALLAN JONES OK. Can you say something about how lossy coding is actually done? LAURENCE DOOLEY Yes. Very often, you find the part of the process involves a change of domain. For instance, some of the processing involved in creating MP3 files is done in the frequency domain, to identify those audio frequencies which we perceive and those we don't, and consequently, those we don't need to bother encoding. The same is true for image coding, like JPEG. The part of the processing is done in what we call the spatial domain, where we now consider spatial frequencies, a concept which is not so straightforward to grasp. But if you remember that sine waves in space can have different wavelengths, then imagine that the spatial variations in image can be resolved into sine waves having different wavelengths. The key idea to remember about part 3 is that of perceptual coding, which as I mentioned before, is especially applicable to multimedia content like audio, images, and video. If the person at the end of a communications link is not going to be able to either hear or see something, then why bother coding and transmitting this information? ALLAN JONES Thanks.

3.2 Sampling and quantisation Sampling is the process of converting a continuous analogue time signal into a discrete time representation. It is one of the first stages in converting an analogue signal (of which sound is a prime example) to a digital equivalent. Figure 11 shows an analogue signal sampled at regular intervals of T_s. During sampling, the value of the source signal is measured. In principle each sample can perfectly represent the value of the waveform at that instant.
Figure 11 Sampling an analogue signal
To represent the analogue signal satisfactorily, certain sampling criteria must be met. One relates to the rate of sampling. This is governed by the sampling theorem, which defines the unique relationship between the source signal’s bandwidth f_b and the sampling frequency $f_{s} = \frac{1}{T_{s}}$ , where T_s is the sampling period. There is no loss of information between the original and sampled signals if and only if the signal is sampled at a rate that is at least twice f_b: f_s ≥ 2f_b. So, for example, a signal with a bandwidth of 4 kHz must be sampled at least 8000 times per second to preserve all the signal’s information. As sampling rates are commonly given in kHz or MHz, the minimum sampling rate here would be given as 8 kHz. If the sampling theorem is not upheld, aliasing occurs. A frequently encountered visual example of aliasing is the apparent backward rotation of spoked wheels in films showing vehicles travelling forwards. Audio aliasing, in which a spurious signal is represented by the samples (Figure 12), can be heard in the following activity.
Figure 12 Aliasing of an under-sampled signal This graph depicts a sine wave. It shows a series of regularly spaced, alternating peaks and troughs. There are nine cycles in total. The sine wave is blue and is labelled as ‘under-sampled sine wave’. Eleven sampling points are shown on the sine wave, represented by dots at regular intervals. The dots are joined to form another smooth sine wave, this time drawn in red, which is labelled ‘alias signal’. The alias signal has only one cycle compared with the nine cycles of the original sine wave.
Activity 13 Exploratory In this audio track you hear audio aliasing. A short piece of speech (‘In my garden I have an apple tree, a hazel tree and a pine tree’) is heard six times. On each repetition the sampling rate is half the rate of the one before. As the sampling rate reduces, aliasing becomes more prominent. The six sampling rates are: 44.1 kHz, 22.05 kHz, 11.025 kHz, 5.512 kHz, 2.756 kH and 1.378 kHz.

3.3 Quantisation The sample values measured during sampling must be quantised to produce a digital representation of the analogue signal. That is, each value is approximated to its nearest quantisation level. Quantisation levels are pre-determined levels, like the rungs of a ladder, between the lowest possible sample value and the highest. The closeness of the approximation between a sample value and its nearest quantisation level depends on the number of quantisation levels available. For example, using a thousand levels involves less approximation than using a hundred levels. Each quantisation level is represented by a unique binary number. The number of levels is therefore related to the number of bits, n, used in the binary numbers that represent the quantisation levels. For example, using 3 bits provides eight (2³) discrete levels represented by: 000, 001, 010, …, 111. Eight bits and twelve bits will quantise the signal into 256 (2⁸) and 4096 (2¹²) levels respectively. In general, an n-bit analogue-to-digital converter (ADC) provides 2ⁿ quantisation levels. Activity 14 Self assessment Compact disc audio uses 16-bit quantisation. How many quantisation levels are there? One quantisation level is designated as zero. Can there be equal numbers of levels for positive and negative values? For 16-bit quantisation, the numbers of quantisation levels is 2¹⁶. This is 65 536. The number of quantisation levels in (a) is even, so there is no central value that can be designated as zero with equal numbers of positive and negative levels on either side. This is true whatever number of bits is used. For example, 3 bits gives 8 quantisation levels. If level 5 is taken as zero, there are four levels (1 to 4) on one side but three (6 to 8) on the other. Quantisation inevitably introduces errors because the analogue signal can potentially take infinitely many values, whereas the number of quantisation levels is finite. You could compare this to a person standing on a staircase. The person’s height above the floor is restricted to the possibilities provided by the stairs. An object released over the side, however, passes through infinitely many heights as it falls to the floor. Figure 13 shows the effect of differing numbers of quantisation levels. A sine wave of amplitude 1 V is shown quantised into 8 levels (n = 3) in (a), 32 levels (n = 5) in (b), 256 levels (n = 8) in (c) and 65 536 levels (n = 16) in (d). The corresponding digital representations are shown in red and resemble steps outlining the sine wave. The difference between the original and digital signals is called quantisation noise (quantisation error) and is displayed in green. As n increases, the corresponding digital representation improves in fidelity because the quantisation noise (step size between adjacent discrete levels) is reduced. In Figures 13(c) and (d) the quantisation error is visually very small; nevertheless there will still be some quantisation noise present. This is why quantisation is lossy – in contrast to sampling, which is lossless provided the sampling theorem is upheld.
Figure 13 Examples of quantisation error (green) for different ADCs: (a) 3-bit, (b) 5-bit, (c) 8-bit and (d) 16-bit
Quantisation is an example of lossy encoding because information is irretrievably lost when infinitely variable sample values are approximated by a set of quantisation levels. In audio work, 16 bits per sample is usual for the final files, giving 65 536 quantisation levels. The choice of 16 bits per sample is an example of perceptual, lossy encoding, in the sense that the choice of 16 bits per sample is based on properties of human hearing. More bits per sample is not considered to give any audible benefit and therefore results in unnecessarily large files. (Professional recording and editing often use more than 16 bits per sample, but this is not so much for higher fidelity as because some editing processes cause a deterioration of fidelity.) However, there is more to perceptual encoding than choosing a set of quantisation levels.

3.4 Perceptual encoding Sounds of certain frequencies or certain colours are perceived better than others. Useful reductions of file size or data rate can often be achieved if this fact is exploited during encoding of the source. MP3 music files for example are typically one-tenth of the size of equivalent, uncompressed music files (such as CD files). Humans are more sensitive to frequencies in the range 1 to 5 kHz than to those outside this range. This is shown Figure 14. The red line is the threshold of hearing. Sounds below the threshold are inaudible. The threshold is lowest between 1 and 5 kHz. It rises above 5 kHz and below 1 kHz. At these frequencies, the quietest audible sounds are louder than the quietest audible sounds between 1 kHz and 5 kHz. In Figure 14, two single-frequency tones A and B are shown with the same amplitude, but A is audible and B is inaudible.
Figure 14 Hearing sensitivity threshold response curve of the human ear with two equal-amplitude frequency tones, A and B This line graph has a Y axis labelled ‘relative signal amplitude’, measured in decibels, and has points minus 100 (bottom) and 0 (top) marked.The X axis is labelled as frequency, measured in kilohertz. The scale is not linear, more of a logarithmic scale. From left to right it has points 0.02, 1, 5 and 20 marked.A smooth analogue waveform is shown, starting at a position 0 on the Y axis. It is labelled ‘hearing sensitivity threshold’. It curves and drops smoothly to a trough at position 1 on the X axis and minus 100 on the Y axis. It remains at minus 100 until about position 5 on the X axis, then rises sharply (note also the logarithmic scale) to a maximum of 0 on the Y axis at position 20 on the X axis. Two blue arrows labelled A and B are shown, starting at minus 100 on the Y axis, both pointing upwards. B is positioned just to the right of position 0.02 on the X axis. A is positioned to the right of B. The two arrows are of equal amplitude (height). B does not cross the hearing sensitivity threshold curve, but A does.
A relatively loud sound at a particular frequency reduces our sensitivity to neighbouring frequencies. This is frequency masking. Figure 15 shows a loud sound A raising the perceptual hearing threshold in its vicinity. Sound B, which would otherwise be audible, is made inaudible. Under these circumstances, it would be unnecessary to encode sound B.
Figure 15 Frequency masking for two single-tone frequencies, A and B, with A louder than B
Another form of masking is temporal masking. This arises because our sensitivity to sounds in a narrow frequency range is reduced for a short period before and after the presence of a relatively strong sound in that frequency range. You may be surprised that sensitivity can be reduced before as well as after a relatively loud sound. This is a result of the way the auditory system and brain process audio information. Following a loud sound, it takes the ear up to 50 ms to be able to respond again to a much quieter sound. The resulting temporal masking envelope is displayed in Figure 16. The shaded region represents inaudible signal amplitudes following a very strong signal at time T.
Figure 16 Temporal masking effect of a loud sound at T and resulting inaudible envelope This line graph has a Y axis labelled ‘relative signal amplitude’, measured in decibels, and has points minus 80 (bottom) and 0 (top) marked.The X axis is labelled as time, measured in milliseconds. It has points T and T plus 50 marked. A long vertical arrow, pointing upwards, is positioned at T, extending from Y equals minus 80 to Y equals 0. Four short, parallel vertical arrows are shown close to the T plus 50 position. The first is about one quarter of the length of the long arrow, the second about half the length, the third about a third of the length and the fourth about one quarter of the length of the long arrow. The waveform starts at position minus 80 on the Y axis, at a position just before T on the X axis. It rises as a straight diagonal line, reaching its peak value of Y equals 0 just after time T. It then remains constant at a relative signal amplitude of 0 decibels until approximately T plus 30. It then drops as a smooth curve to a minimum at T plus 50. The area under the waveform from just before time T through to time T plus 50 is shaded and labelled ‘inaudible signal amplitudes’.

3.5 MPEG audio layer 3 (MP3) In connection with frequency masking, we said that the masked sound B in Figure 15 did not need to be encoded. You might wonder how sounds can be selectively encoded if others are present at the same time. The answer is by splitting the audio band into sub-bands which are encoded separately. If a masked sound occupies a different sub-band from a masking sound, one can be ignored and the other encoded. Figure 17 shows the elements of the creation of an MP3 audio file. The source input is generally assumed to be an audio data stream from either a CD (f_s = 44.1 kHz) or studio-recorded material (f_s = 48 kHz). The signal is filtered into 32 critical frequency sub-bands that are designed to reflect the way the ear perceives sounds.
Figure 17 MP3 encoder The diagram shows three rectangles depicting a block diagram of an MP3 encoder. There are two rectangles side by side at the top of the diagram. The rectangle on the left is labelled ‘32 critical frequency filter bank’. The rectangle on the right is labelled ‘allocate number of bits’, and above this rectangle is the label ‘VBR coder’. The third rectangle is below and between the two upper rectangles. It is labelled ‘compute masking levels’. Beneath this rectangle there is also the label ‘psychoacoustic model’. The inputs to the MP3 encoder are shown as a blue arrow that enters from the left and splits to point at the two rectangles labelled ‘32 critical frequency filter bank’ and ‘compute masking levels’. The output of the ’32 critical frequency filter bank’ rectangle is shown as another blue arrow that splits to point at the two rectangles labelled ‘compute masking levels’ and ‘allocate number of bits’. The output of the ‘compute masking levels’ rectangle is shown as another blue arrow that points at the ‘allocate number of bits’ rectangle.Finally, the output of the ‘allocate number of bits’ rectangle is shown as a blue arrow pointing to the right. This is labelled ‘MP3 bitstream’.
The 32 critical sub-bands are sampled separately, yet this does not increase the total number of samples beyond what would be required if the audio band were not split into sub-bands. Sub-bands typically have a width of 750 Hz, for which the sampling theorem requires a minimum sampling rate of 2 × 0.75 kHz = 1.5 kHz. Therefore, across the 32 sub-bands, the minimum number of samples per second must be 32 × (1.5 × 10³) or 48 × 10³. This is exactly the same as for a single band with a total bandwidth of 32 × 0.75 kHz = 24 kHz, for which the sampling theorem requires the minimum sampling rate to be 48 kHz. Once the source signal has been split into critical sub-bands, the next step is to determine the amount of masking in each sub-band and its effect on adjacent bands – the so-called mask-to-noise ratio (MNR). This makes extensive use of the two psychoacoustic masking effects of the ear discussed above to govern the appropriate quantisation levels to be used in each different frequency sub-band. Collectively these define the masking threshold, which determines which frequencies will and will not be coded. If the signal level in a sub-band is below the masking threshold, it is not encoded; if it is above the threshold, it will be coded using variable bit-rate coding (VBR). In VBR, the number of bits allocated to represent each frequency component is based upon the level of quantisation noise. In digital audio, the S/N ratio is approximately equivalent to ~6 dB bit⁻¹ so the more bits allocated the higher the S/N ratio. As an example, Table 1 shows the output levels of the first 12 critical sub-bands at a specific instant for an MP3 encoder. The output levels indicate the extent to which the level in any particular sub-band exceeds the threshold of hearing in that sub-band. If the output level were 0 in any sub-band, encoding would not be required in that sub-band because the output level would be on the threshold of audibility. Table 1 Outputs from a sub-band MP3 encoder filter

Critical Sub-band 1 2 3 4 5 6 7 8 9 10 11 12

Output level/dB 18 14 42 58 12 5 10 8 6 1 4 2

Sub-band 4 has a high output level of 58 dB. Suppose this produces an effective masking threshold of 16 dB to sub-band 5. As 16 dB exceeds sub-band 5’s output level of 12 dB, sub-band 5 does not need to be encoded in the time period covered by these output levels. Activity 15 Self assessment Suppose sub-band 4 produces an effective masking threshold of 20 dB to sub-band 3. Does sub-band 3 need to be encoded? The output level of sub-band 3 is 42 dB, which is above the masking threshold of 20 dB provided by sub-band 4, so this sub-band needs to be encoded.

MP3 continued The last activity showed that sub-band 3 is not, in this instance, masked by the loud sound in sub-band 4. However, the raising of the threshold by 20 dB means that for encoding purposes sub-band 3’s output level is reduced. Specifically, as the output level of sub-band 3 exceeds the threshold by (42 – 20) dB, the effective level that needs to be encoded is only 22 dB. In the VBR encoding used in MP3, 1 bit is allocated per 6 dB of level above the threshold. This means that sub-band 3, which exceeds the threshold by 22 dB needs an allocation of 4 bits to encode this sample. An allocation of 3 bits would be insufficient as 3 × 6 dB = 18 dB, which is below the level of 22 dB, whereas 4 × 6 dB = 24 dB, which is above 22 dB. The procedure outlined here has to be carried out across all sub-bands where there is frequency masking from other sub-bands. The effect of temporal masking also has to be taken into account. These processes have to be repeated to cover the entire duration of the recording. MP3 achieves high-quality audio reproduction at 128 kbit s⁻¹. This contrasts markedly with the CD bit rate of 1.4112 Mbit s⁻¹. MP3 generally achieves 10:1 compression without introducing notable subjective effects into the reconstructed sound. Incidentally, it is common to refer to compressed audio files in terms of a bit rate in kbit s⁻¹ or Mbit s⁻¹ rather than as an actual file size. The reason for this convention is that MP3 and other audio formats are extensively used in streaming applications where the emphasis is on throughput and quality of service (QoS) rather than storage capacity. The majority of MP3 recordings are made at 128 kbit s⁻¹, which provides sufficient audio quality that the majority of people (apart from hi-fi buffs) would not notice the difference. As the bit rate drops to 64 kbit s⁻¹, however, the loss becomes much more perceptible at the top (treble) end. The bass response also tends to degrade, and higher frequencies take on a distinctly artificial digital tone. The reason for this is that the MP3 developers decided to limit the audio bandwidth to approximately 16 kHz for 128 kbit s⁻¹ and only approximately 8 kHz for 64 kbit s⁻¹. Activity 16 Exploratory This activity ‘Perceptual sensitivity and masking’ allows you to explore some audio examples of the relative hearing sensitivity response of the ear, as well as frequency and temporal perceptual masking effects. Frequency masking To demonstrate frequency masking, you will hear a relatively loud sine-wave tone (440 Hz) masking a quieter tone at a different frequency (652 Hz). The image below provides a visual representation of the audio clip: the horizontal direction represents time, the vertical direction represents amplitude, and the green shapes are the envelopes of the sine-wave tones. The sine waves are too closely packed for their cycles to be visible. You will probably find it helpful to look at this image while you play the audio clip.
The diagram is a screen shot from audio-editing software showing a graphical display of sounds. In this display the horizontal axis represents time. The sounds are shown as solid blocks of green. The height of the block represents the sound’s amplitude and its width represents its duration. Each block of sound is labelled with a letter. The letters go from A to I.The first sound shown is a tall block of short duration, labelled A. This is two seconds of the 440 hertz masking tone. After this sound there is a short period of silence, and then a very shallow sound labelled B. This is two seconds of the quiet 652 hertz masked tone. This is followed by a short gap of silence.The next sound is as tall as the first and is labelled C. This represents four seconds of the 440 hertz masking tone. This merges into a very slightly taller section labelled D, which consists of both the 440 hertz masking tone and the quiet 652 hertz masked tone. After four seconds the amplitude of this sound decreases over a period of about four seconds. This period of decreasing amplitude is labelled E. The amplitude decreases to the level of B, the 652 hertz masked tone, indicating that the 440 hertz masking tone has been reduced to zero, leaving just the 652 hertz masked tone.The next section is F, and merges with E before it. It represents the 652 hertz masked tone. After four seconds of this, the amplitude of the block gradually gets bigger, indicating that the 440 hertz masking tone is brought back in and increases in volume. This part is labelled G. The amplitude reaches the same value as section D, and stays at this amplitude for four seconds. This section is labelled H. After four seconds the amplitude drops very slightly, indicating that the 652 hertz tone has dropped out leaving just the 440 hertz masking tone. This section is labelled I.
The first two sounds in the clip are simply to familiarise you with the masking tone (440 Hz, shown in the figure at A) and quieter masked tone (652 Hz, shown at B). You will hear 2 seconds of each. The masking demonstration follows. There are 4 seconds of the 440 Hz masking tone (C). The quieter 652 Hz tone is then added, and the two tones are played for 4 seconds (D). When you play the audio clip, try to identify whether you can hear the 652 Hz tone during this part. The 440 Hz masking tone then fades out (E), leaving just the 652 Hz tone for 4 seconds (F). Finally, the 440 Hz masking tone gradually fades back in (G) and should eventually mask the 652 Hz tone (H). The audio clip ends with 4 seconds of just the 440 Hz tone again (I). Play the audio clip now. I heard a relatively loud single tone followed by a quieter, higher tone. When the two tones were played together, I heard only the louder, lower tone. However, as the lower tone decreased in volume, the quieter, higher tone seemed to fade in until I could hear it clearly. Similarly, as the lower tone faded back in and grew louder again, the higher tone seemed to fade out until I was no longer able to hear it. Thus although the quieter, higher tone was playing all along, I was only able to perceive it when the louder, lower tone was sufficiently quiet. Temporal masking In temporal masking, a loud sound makes a closely following sound inaudible. The effect is most noticeable when the following sound is relatively quiet, and when it follows after a very short gap. The image below shows the sequence of sounds used in each of the demonstration audio clips.
The diagram is a screen shot from audio-editing software showing a graphical display of sounds. In this display the horizontal axis represents time. The sounds are shown as solid blocks of green. The height of the block represents the sound’s amplitude and its width represents its duration.The diagram shows two blocks of sound, with a gap between them. The first block of sound has a large amplitude and a relatively long duration. Its duration is actually one second, but it occupies most of the diagram. This block is labelled ‘loud 632 hertz tone’. After this comes a short gap of silence labelled ‘gap’. This is followed by a short, low-amplitude burst of sound labelled ‘quieter 632 hertz tone’.
In each of the four audio clips below, a relatively long, large-amplitude 632 Hz tone is followed by a gap, and then a quieter version of the same tone. In the first clip, the gap between the two tones is fairly long (60 ms). The tone after the gap is audible as a very short blip (like a faint echo) after the main tone. In successive clips, the gap gets shorter. You should find that the final blip becomes inaudible as the gap decreases to 10 ms. Gap = 60 ms Gap = 40 ms Gap = 20 ms Gap = 10 ms

3.6 MPEG-4 AAC (advanced audio coding) MPEG-4 AAC (advanced audio coding) was designed as the successor to MP3 for low-bit-rate perceptual audio compression, with efficient internet multimedia streaming applications in mind. Its development was also motivated by the quest for efficient coding of multichannel surround-sound signals. So-called ‘5.1 surround sound’ includes five full bandwidth channels (left, right, centre, left surround and right surround), with the ‘point 1’ referring to a dedicated low frequency effect (LFE) channel carrying bass information in the 3 to 120 Hz band. AAC has now been formally embedded in both the MPEG-2 and MPEG-4 audio standards; it is the default format for various multimedia applications and services, from YouTube to Apple’s iTunes. The broad consensus is that, subjectively, the AAC encoder (.mp4 files) provides better audio quality for the same bit rate as MP3, with greater flexibility and functionality. In comparison with MP3, AAC offers a range of sampling rates up to 96 kHz, and also supports up to 48 channels (mono, stereo and multichannel surround sound). In terms of coding, it uses either 2048 or 256 sub-bands compared to 32 for MP3, thus providing better frequency resolution for the psychoacoustic modelling and perceptual masking steps. Another noteworthy feature of AAC encoders is that audio files do not have to be encoded at a specific streaming speed. Instead the file is coded once, then streamed at a variable bit rate depending on the connection speed and network traffic conditions. This is a consequence of AAC supporting scalable representations in terms of sample amplitudes (or S/N ratio) and sampling rates. MPEG-4 AAC and its variants excel at low bit rates by virtue of a series of extensions and tools that have evolved and subsequently become embedded into the standard. Figure 18 identifies three key tools that have been instrumental in the advancement of this standard: perceptual noise substitution (PNS) spectral band replication (SBR) parametric stereo (PS) Further information on each of these is readily available on the Web. While each tool to some extent adds complexity to the encoder, it also provides notable improvements in coding efficiency and corresponding audio quality.
Figure 18 MPEG-AAC audio encoder family The figure shows three squares in a row, representing the key tools that have been instrumental in the advancement of the MPEG-4 AAC family of standards. From left to right, these are perceptual noise substitution (PNS), spectral band replication (SBR) and parametric stereo (PS).Above the three squares is a row of four rectangles. From left to right, these are labelled MPEG-2 AAC-LC, MPEG-4 AAC-LC, MPEG-4 HE-AAC and MPEG-4 HE-AAC v2. Between each pair of rectangles is a short horizontal arrow pointing to the right, so three arrows in total.The three squares are connected to the three arrows between the four rectangles. The connections are shown by three dashed vertical lines.
AAC-LC (low complexity) is the most widely used coding profile in this standard, and the default format for Apple’s iTunes. Since AAC involves many varied processes in analysing different types of audio signal, no single algorithm is able to meet the diverse set of requirements it must fulfil. Therefore AAC has integrated different applications into a single framework covering music synthesis, low-bit-rate speech coding, text-to-speech synthesis and general perceptual audio compression across a host of different bit rates. The most recent AAC extension is High-Efficiency AAC (HE-AAC) also known as AACplus. It is specifically optimised for very-low-bit-rate applications such as audio streaming and podcasting, and is now the standard technology used in digital radio broadcasting. It embraces SBR technology to encode and store high frequency information as part of the standard, and is able to deliver near-CD quality sound at 64 kbit s⁻¹. At the time of writing, the most recent version is HE-AAC version 2, which employs the third major extension in Figure 18 – parametric stereo (PS) – to improve the audio quality at low bit rates and increase compression by up to 40%. This analyses the spatial characteristics between the left and right channels of a stereo signal to exploit inter-channel redundancies. PS characterises the inter-channel features of the stereo signal and, depending on the source, typically provides a bit-rate saving of up to a factor of 10. Activity 17 Exploratory This ‘Audio coding’ activity allows you to compare several versions of the same audio sample that have been compressed using different standards. In this activity you will hear a sample of speech that has been processed with different compression formats. In the order in which you will hear the speech samples, the formats used are the following four: MP3 AAC LC HE-AAC v1 HE-AAC v2. This is theoretically the order of increasing quality. All four extracts are at a bit rate of 16 kbit s⁻¹. This low bit rate has been chosen to emphasise the differences in quality between the formats, which are less noticeable at higher bit rates. The speech extract used consists of the following two sentences: In my garden I have an apple tree, a hazel tree and a pine tree. My neighbours have an apple tree too. With each repetition the quality should improve, although many people find little difference between the second and third versions (AAC LC and HE-AAC v1). Play the audio clip now. Since the greatest difference is between the first and last extracts in the above sample, the following sample uses just those extracts (that is, MP3 followed by HE-AAC v2).

3.7 Image and video compression Section 3 has concentrated on lossy, perceptual coding in audio files, but it is widely used in image coding and video coding. Lossy compression in JPEG image coding exploits the fact that the human visual system is less sensitive to fine detail in an image than to broader features. JPEG coding transforms an image to spatial frequency components using a discrete cosine transform (DCT), then uses fewer bits to encode the higher spatial frequencies than the lower ones. Thresholding completely removes those components with very low amplitudes. This is a lossy process. Further lossless compression completes the process, but overall the process is lossy. As moving images consist of sequences of still images, the first step in video compression is to compress the individual still images (frames). The MPEG family of standards use methods based on JPEG for compressing the still images, but then use techniques based upon motion prediction and compensation to exploit the temporal similarities between consecutive image frames. Recent trends in video coding have led to the development of systems that incorporate multiview and 3D information, as well as more distributed approaches to video coding that shift some of the coding complexity from the encoder to the decoder.

3.8 Summary Lossy compression is a type of source coding in which information is irretrievably lost. Quantisation is an example, but the term is mostly used in connection with perceptual coding in connection with audio, image and video. MP3 audio coding uses lossy compression by exploiting frequency masking and temporal masking. The MP3 encoder identifies parts of a sound that are not perceived due to these two masking effects and does not encode them. Advanced audio coding (AAC) (used in MPEG 4) supports efficient multimedia streaming and is also used with surround sound. It offers a wider range of options than MP3 coding, and can use three key audio quality extensions without excessively increasing the bit rate: perceptual noise substitution, spectral band replication and parametric stereo. Perceptual coding is used in image files (such as JPEG) and in MPEG video files.
4 Broadband, mobile and WiFi In this introductory podcast Allan Jones talks to Helen Donelan about issues related this section. ALLAN JONES Hello, I’m Allan Jones, and I produced the material here in Section 4. Now for several years, I’ve benefited from being able to talk to colleagues about topics that relate to Section 4. Helen Donelan, whom you met in Section 1. Hello, Helen. HELEN DONELAN Hello. ALLAN JONES Widespread mobile communications began with mobile analogue voice telephony, which is now referred to as first generation. And that was followed by digital telephony, referred to as 2G. Mobile data didn’t really take off, though, until the third generation, or 3G. This uses a wireless technology called Wideband Code Division Multiple Access, or WCDMA. What's distinctive about WCDMA? HELEN DONELAN Well, the trick with WCDMA is to replace each one and zero of data with a burst of very short binary signals that we call chips. Like digital data, chips are binary in the sense that they only have two possible states. But instead of calling these states one or zero, we call them one or minus one. So a one in the user’s data might be replaced by tens or hundreds of very brief chips, each being either a one or a minus one. The sequence of ones and minus ones used to represent each data bit is called a code. So in Wideband CDMA, each user has their own code or chip sequence for representing the ones and zeros of their data. If you get the codes right, you can transmit several people’s data simultaneously and on the same frequency. And there’s no interference between them. ALLAN JONES OK. In fourth generation, a very different procedure is used. What’s the basic idea here? HELEN DONELAN 4G isn’t based on CDMA. It’s based on Orthogonal Frequency Division Multiplexing, or OFDM. The idea of OFDM is to divide the radio channel into many very narrow slices which we call subchannels. You could have around 100 subchannels or more in a single radio channel. Each of these slices can be treated independently of the others, with its own carrier wave and is modulated to carry data just within that subchannel. You can allocate some subchannels to one user and others to another user and so on. And that’s how multiple access is achieved in 4G. You get other benefits too, the main one being improved resistance to multipath interference. ALLAN JONES OK, thanks.

4.1 Introduction; access and core networks Because of the ubiquity of wireless communications, such as WiFi and cellular mobile communications, it is tempting to think that it is only a matter of time before fixed-line communications, such as broadband and optical fibre, are superseded by forms of wireless communication. This is unlikely to occur because in many countries wireless communication depends on an infrastructure of fixed-line communications. In addition, it is likely that fixed-line communications will generally (but not in all cases) out-perform wireless communication because of the greater amount of frequency spectrum available in fixed-line communications. The major fixed-line infrastructure in countries with a long history of telecommunications (such as the UK) is a ‘legacy’ public switched telecommunications network (PSTN) initially devised for telephony. The PSTN now carries much more than telephony. Parts of it have become essential infrastructure for carrying data as well as voice. In addition, there are newer networks such as those created for mobile telephone and data services. A useful structural division of these networks is as follows. A core network, which is largely a fixed, high-speed, intensively used communications network. It is somewhat analogous to a network of motorways and major trunk roads. Core networks often interconnect with other core networks. For example, all the mobile operators’ core networks interconnect with the PSTN core network. An access network, which links end-users’ equipment to the core network via a local exchange or local radio node. The access network is analogous to the minor roads that give access to motorways and other trunk routes. Consumer premises equipment (CPE) consists of the devices used by subscribers for consuming data (for example, fixed-line telephones, computers and fax machines). In the mobile world, user equipment (UE) is the term used for this part of the network. This section is concerned with three widely used access networks: DSL broadband, which is currently the most widely used form of fixed-line broadband; fourth generation (4G) mobile broadband; WiFi. Other forms of access, which this section does not look at, include: third generation (3G) mobile broadband; ‘cable’, more properly known as Hybrid Fibre-Coaxial (HFC), which delivers television, broadband and telephony; optical fibre, which, in the form of ‘fibre to the premises’, can be used to give access to the core network (which itself is usually based on optical fibre).

4.2 Orthogonal frequency division multiplexing (OFDM) Orthogonal Frequency Division Multiplexing, (OFDM) and its close relative Orthogonal Frequency Division Multiple Access (OFDMA), are widely used forms of modulation. They are used in DSL broadband, 4G mobile communications, WiFi, digital television, powerline communications, cable television and digital audio broadcasting (DAB). Modulation always spreads the power of a transmission around the carrier frequency. In conventional frequency division multiplexing (FDM), as used in radio broadcasting for example, carrier frequencies are well separated to prevent the intrusion of power from neighbouring carriers. Figure 19 shows power distribution around some modulated carriers.
Figure 19 Wide carrier separation in Frequency Division Multiplexing This figure has two parts, labelled A and B.Part A is a representation of frequency division multiplexing (FDM). There is a horizontal axis labelled ‘frequency’, and six equally spaced marks on the frequency axis are labelled ‘carrier frequencies’. These marks are quite far apart. Centred on each of the carrier frequencies is a wave shape that has a positive peak at the carrier frequency. On either sides of the carrier frequency, the height of the wave falls away symmetrically down to the axis, then drops below the axis to a small negative peak, then rises to a small positive peak, and the wave makes a couple of further, smaller ripples above and below the frequency axis. The ripples merge into those of the wave shapes centred on adjacent carrier frequencies.Part B is a representation of orthogonal frequency division multiplexing (OFDM). Again there is a horizontal frequency axis marked with six equally spaced markers as in part A. The six marks are very close, unlike those in part A, and are labelled ‘subcarrier frequencies’. Centred on each frequency is a wave shape identical to those in part A; that is, a high positive peak centred on the marked frequency, falling away sharply on either side in ripples of decreasing amplitude. Because the frequency markers are so close, there is much more overlapping of the wave shapes than in part A, so that as the wave shape falls away on either side of the peak, at a value of about three-quarters of its peak value it intersects the wave shape centred on the adjacent subcarrier frequency. Each subcarrier peak coincides with zeros on all the other waveforms it overlaps with. An arrow from the end of the last subcarrier to the end of the frequency axis is labelled ‘bandwidth reduction’. It shows that the frequency range of part B of the diagram is a lot narrower than that of part A.
In OFDM, by contrast, all the carriers (known as subcarriers) are closely packed. As a result, their spectra overlap. The expected mutual interference doesn’t occur because each subcarrier frequency coincides with zero power in all the other subcarriers, as shown in Figure 20. The subcarrier spacing required to make this happen is determined by a mathematical relationship between the spacing and the symbol rate of the modulation.
Figure 20 OFDM subcarriers This is a closer view of Figure 19 part B. There is a horizontal frequency axis marked with four equally spaced markers labelled ‘subcarrier frequencies’. Centred on each frequency is a wave shape with a high positive peak centred on the marked frequency, falling away sharply on either side in ripples of decreasing amplitude. Because the frequency markers are close together, there is a lot of overlap of the wave shapes, so that as the wave shape falls away on either side of the peak, at a value of about three-quarters of its peak value it intersects the wave shape centred on the adjacent subcarrier frequency. Each subcarrier peak coincides with zeros on all the other waveforms it overlaps with.
All the subcarriers operate at the same symbol rate, but each is usually modulated independently of the others. We can think of each subcarrier as being at the centre of a narrow frequency channel, as in Figure 21. In OFDM, these channels are called subchannels, and their width is equal to the subcarrier spacing.
Figure 21 Subchannels centred on subcarrier frequencies This is based on Figure 20, which is shown faintly in the background.There is a horizontal frequency axis marked with six equally spaced markers labelled ‘subcarrier frequencies’. Centred on each frequency is a faint wave shape with a high positive peak centred on the marked frequency, falling away sharply on either side in ripples of decreasing amplitude. Because the frequency markers are close together, there is a lot of overlap of the wave shapes, so that as the wave shape falls away on either side of the peak, at a value of about three-quarters of its peak value it intersects the wave shape centred on the adjacent subcarrier frequency. Each subcarrier peak coincides with zeros on all the other waveforms it overlaps with.A series of six tall, thin, adjoining rectangles is shown much more boldly over the wave shapes. Each rectangle is centred on a subcarrier frequency, and each is as tall as the peak of the wave shape underneath. Each rectangle is wide enough to encompass the peak of the wave shape underneath, and the wave as it drops away from the peak and intersects the adjacent waveform.The rectangles are labelled ‘subchannels’, and the width of a rectangle is labelled ‘subchannel width’. The total width of the six rectangles is labelled ‘overall bandwidth of multiplex’.
A good example of the use of subcarriers and subchannels is Digital Subscriber Line broadband, or DSL.

4.3 Digital Subscriber Line (DSL) broadband In DSL an ordinary telephone line (consisting of a pair of twisted copper wires) is used to deliver broadband to homes and offices. DSL uses a variant of OFDM called Discrete Multitone (or DMT). A common form of DSL is Asymmetrical DSL, or ADSL, which itself is available as ADSL1, ADSL2 and ADSL2+. Figure 22 shows ADSL subchannels which are numbered from zero upwards.
Figure 22 ADSL subchannels The frequency bands are arranged horizontally, on a scale that is marked at 0, 4.3125 kilohertz, 25.875 kilohertz, 138 kilohertz, 1.104 megahertz, and 2.208 megahertz.On the left there is a small rectangular blue section labelled ‘POTS’. This ranges from 0 to 4 kilohertz. It is numbered 0.There is a gap between 4 kilohertz and 25.875 kilohertz. This is marked as unused and subdivided into subchannels 1 to 5. The numbers are labelled ‘subchannel numbers (indexes)’. A horizontal arrow indicates that each subchannel width is 4.3125 kilohertz.Then there is a rectangular green section labelled ‘upstream’. This ranges from 25.875 kilohertz to 138 kilohertz. It is divided into subchannels 6 to 31. One subchannel, for which no number is given, is labelled ‘pilot’.Finally, on the right, there is a large rectangular red section labelled ‘downstream’. This ranges from 138 kilohertz to 2.208 megahertz. It is divided into subchannels 32 to 511 (but subchannel 32 is white as it is not used). One of the midrange subchannel, somewhere between 33 and 255, is labelled ‘pilot’.Within the red area, the range of ADSL1 and ADSL2 is represented by a horizontal arrow from a frequency of 138 kilohertz to a frequency of 1.104 megahertz (subchannels 32 to 255). The range of ADSL2+ is represented by a horizontal arrow from a frequency of 138 kilohertz to a frequency of 2.208 megahertz (subchannels 32 to 511).
The subchannels in DSL are 4.3125 kHz wide. Subchannel zero is reserved for the ‘Plain Old Telephony Service’ (POTS), which is analogue telephony. Subchannels 6 to 31 are for upstream data (from the user to the telephone exchange), and subchannels from 32 upwards are for downstream data (from the exchange to the user). Some subchannels are unused, for example subchannels 1 to 5 and 32. These provide a guard band between groups of subchannels. Other subchannels do not carry user data but are used for pilot tones required for the proper functioning of the system. Downstream subchannels greatly outnumber upstream subchannels, hence the ‘asymmetry’ of ADSL. The downstream data rate is therefore much higher than the upstream rate.
Figure 23 Signal-to-noise ratio of a 2 km long telephone line A line graph is shown with the X axis labelled tone number. It is marked from 0 to 408 in increments of 17. The Y axis is labelled S/N ratio per tone in decibels and is marked from minus 20 (bottom) to 70 (top) in increments of ten.The upstream signal is shown to the left of the graph, in green, starting just after tone number zero, at 0 decibels. It rises rapidly to a peak of just over 50 decibels at about tone number 17. It then falls back to about 40 decibels just before tone number 34, then drops rapidly to 0 decibels.The downstream signal is shown in red and starts at 0 decibels where the upstream signal ends. The downstream signal rises rapidly from 0 decibels over the space of about 10 tones, rising to over 50 decibels around tone number 42 and at around tone 50. It then fluctuates as it drops steadily downwards. At several points, it drops to 0 decibels and then rises again to slightly below its original position. The downward fluctuation continues until tone number 365.
Figure 23 is the signal-to-noise ratio of a 2 km long telephone line. The ‘tone numbers’ on the horizontal axis are the same as subchannel numbers. The line gets noisier at higher numbers, which almost always happens. There are places where the subchannels are so noisy as to be unusable, and this is commoner at the upper end.
Figure 24 Bit loading of a 2 km ADSL2+ line A line graph is shown with the X axis labelled tone number. It is marked from 0 to 408 in increments of 17. The Y axis is labelled bits and is marked from 0 (bottom) to 20 (top) in increments of two.The upstream bits are shown to the left of the graph, in green, starting just after tone number zero, at 0 bits. The graph rises rapidly to a peak of 14 bits at about tone number 17. This is labelled ‘highest bit loading’. It then falls back to zero just before tone number 34.The downstream bits are shown in red and start where the upstream signal ends. From 0 bits at just before tone number 34, the graph rises rapidly to 12 bits just after tone number 34. At about tone number 51 there is a peak of 13 bits, labelled ‘highest downstream bit loading’. The downstream bits then fluctuate as they drop steadily downwards. At several points, they drop to 0 bits and then rise again to slightly below the original position. The downward fluctuation continues until tone number 365. A few points where the number of bits has fallen to 1 are labelled ‘lowest bit loading’.
Figure 24 shows the bit loading per subchannel, which is the number of bits per symbol per subchannel. The better, less noisy subchannels (generally at lower numbers) are loaded with more bits than are the poorer subchannels. QAM modulation is used in each subchannel, enabling the bit loading to be varied. For example, 64-QAM gives 6 bits per symbol (because 2⁶ = 64), and 16-QAM gives 4 bits per symbol (because 2⁴ = 16). There are 4000 symbols per second in the widely used versions of DSL, so Figure 24 shows the number of bits per subchannel per symbol period. Because each telephone line has a different noise characteristic which varies over time, a ‘one size fits all’ modulation scheme,(where there is a standard order of QAM for each subchannel), would not be very efficient. OFDM, however, allows optimum use of each line because the subchannel bit loading can be automatically customised for the prevailing noise at each location. The bit loading can change as noise conditions change during the day, or from day to day. If a particular subchannel is not loaded to its capacity, it can be topped up with bits that would otherwise have exceeded the capacity of another subchannel. This process is known as bit swapping, and is a distinguishing feature of DMT (as opposed to OFDM). Despite the order of QAM being chosen to suit the noise conditions, errors still occur. Reed-Solomon error correction is therefore incorporated into DSL, and on particularly poor lines interleaving of data units may also be used. This increases the latency of data transmission. Activity 18 Self assessment If every downstream subchannel in ADSL2+ were loaded to 13 bits, what would be the data rate? Ignore unused subchannels. There are 4000 symbols per second. Each symbol in each subchannel is loaded to 13 bits, so data rate per subchannel is 4000 × 13 bit s^-1 = 52 kbit s^–1 The highest numbered downstream subchannel is 511. The lowest is 32. The total number of subchannels is therefore 480. Therefore the overall data rate is 480 × 52 kbit s^–1 = 24.96 Mbit s^–1

4.4 VDSL2 VDSL2 (version 2 of Very High Bit Rate DSL) achieves higher data rates than ADSL by shortening the copper line. Shortening results from the use of a locally installed connection box like Figure 25.
Figure 25 Fibre box used with VDSL2
The box contains, effectively, a piece of telephone-exchange equipment which is linked to the local exchange by optical fibre. Subscribers have the copper line from their home terminated at the box – at least as far as data is concerned. (Telephony continues to use copper wire to the exchange.) Because signal attenuation per metre of copper wire increases with increasing frequency, a major benefit of a short copper line is that higher frequencies can be used and therefore more subchannels made available. VDSL2 uses many more subchannels than any version of ADSL. Various frequency plans, or profiles, are available for VDSL2. Figure 26 is a common one, called ‘17a’, in which the highest frequency is just above 17 MHz.
Figure 26 VDSL2 profile 17a Shows a diagram with the X axis marked as frequency in megahertz. The frequency is marked at 0.138, 3.75, 5.2, 8.5,12, 17.664 and 30. The upstream and downstream bands are shown as alternating sections of green and red. Upstream 0 is from 0 to 0.138 megahertz, downstream 1 is from 0.138 to 3.75 megahertz, upstream1 is from 3.75 to 5.2 megahertz, downstream 2 is from 5.2 to 8.5 megahertz, upstream 2 is from 8.5 to 12 megahertz, and downstream 3 is from 12 to 17.664 megahertz.
Downstream data rates with VDSL2 are typically around 30 to 80 Mbit s^–1, and upstream rates can be in the region of 20 Mbit s^–1, although providers often set a lower rate. Figure 27 is a setting screen from a domestic VDSL2 broadband router. A range of VDSL2 profiles is shown; and because the symbol rate and subchannel width are standard across VDSL2 and ADSL the router can also cope with ADSL (as evident in the ‘Modulation’ row). ‘G.DMT’ is an informal name for the first ADSL standard, now usually referred to as ADSL1. ‘G.lite’ is a very basic form of ADSL that did not catch on. ‘SRA’ is Seamless Rate Adaptation and is one way of dynamically changing the speed of the connection according to noise conditions.
Figure 27 Settings screen of a domestic DSL router

4.5 4G mobile broadband Fourth generation (4G) mobile communication is regarded as part of a so-called Long Term Evolution from 3G, and is often referred to as LTE, or 4G LTE. It was the first mobile communication system designed only for data (as opposed to 3G’s voice-and-data design, and 2G’s voice-only design on to which a data service was grafted). The radio bands used are generally in the region of 800 MHz, 1.8 GHz, 2.1 GHz and 2.6 GHz. 4G uses OFDMA in the downlink (from the base station to the user). The uplink (from user to base station) uses the related Single Carrier Frequency Division Multiple Access (SCFDMA), which is more energy-efficient. (SCFDMA is not covered here.) The uplink and downlink generally occupy different frequency bands, in common with the widely adopted versions of 2G and 3G. All mobile data communication uses the idea of a unit of resource in the downlink. A resource unit is a stream of data allocated exclusively to a user for, typically, a short time. Figure 28 represents a resource block used in 4G. Subchannel index numbers are placed up the vertical axis (unlike the earlier diagrams for DSL where they were on the horizontal axis). There are 12 subchannels in the block, each 15 kHz wide, so a block occupies 12 × 15 kHz = 180 kHz of spectrum. A base station might typically use a channel 20 MHz wide, in which case the number of resource blocks available would be in the region of 110.
Figure 28 Resource block for 4G A line graph is shown with the X axis labelled time and the Y axis labelled frequency.On the graph, a resource block is shown as a rectangular shape, divided into small square sections with vertical and horizontal lines. One of these sections is labelled ‘resource element’.The resource block is seven resource elements wide. The sections along the X axis are numbered from 0 to 6, indicating OFDM symbol times. The total time taken by the seven sections is marked as one slot, 0.5 milliseconds.The resource block is twelve resource elements high. The sections along the Y axis are numbered from 0 to 11, indicating subchannel indexes. The height of a section is labelled ‘subcarrier separation equals 15 kilohertz’. The total frequency, or the total height of the twelve sections, is marked as 180 kilohertz (which equals 12 times 15 kilohertz).
The horizontal axis in Figure 28 represents time, but in units of symbol periods (i.e. symbol duration). Symbols 1 to 6 each have a symbol period of slightly under 71.4 ms. Symbol 0 is given a slightly longer symbol period of 71.8 ms just to make the total slot time up to 0.5 ms. For poorer signal conditions the slot time of 0.5 ms is divided into six symbol periods rather than seven, giving a lower data rate but greater resilience to noise. Activity 19 Self assessment If 16 QAM is used throughout the resource block in Figure 28, how much data does this block convey? Hence what data rate is represented is represented by a continuous allocation of 1 resource block to a user? Each resource element in Figure 28 is one symbol in one channel. In 16 QAM a symbol carries 4 bits of data. There are 7 × 12 = 84 resource elements in the block. Each is loaded with 4 bits, so total data is 84 × 4 bits = 336 bits. Each block lasts 0.5 ms, so there are 2000 blocks per second. A continuous allocation of one block to a user gives a data rate of 2000 × 336 bit s^–1 = 672 kbit s^–1 The smallest number of resource blocks that can be allocated to a user at any moment (apart from zero) is 6. The shortest duration of an allocation is two slots, or 1 ms. Overall, therefore, the smallest allocation possible is conceptually six resource blocks high and two resource blocks (slots) wide, giving a minimum resource allocation of 12 resource blocks in total (Figure 2.10). The six resource blocks are not necessarily adjacent in frequency.
Figure 29 Smallest resource-block allocation in 4G Twelve resource blocks are shown, but with no detail. These are arranged in six rows and two columns. The six rows are labelled ‘six blocks’. The two columns are labelled ‘two slots (equals 1 millisecond)’.
The 1 ms duration of the minimum allocation is the scheduling interval. At intervals of 1 ms the allocation of resources to users is reviewed, and the resource allocation and modulation methods changed if necessary, for example, in order to take account of changing link quality between base station and user. Activity 20 Self assessment What data rate does a continuous minimum allocation give if 64 QAM is used? Activity 19 shows that allocating a single resource block using 16-QAM gives 672 kbit s^–1. Whereas 16-QAM gives 4 bits per symbol, 64-QAM gives 6 bits per symbol, or 3/2, or 1.5, times as many as 16-QAM. The minimum allocation is 6 resource blocks, so the data rate is 672 kbit s^–1 × 1.5 × 6 = 6.048 Mbit s^–1 A user would not get the data rate calculated in the last example for two reasons. Firstly, not all the resource elements in a block can carry user data. Some are reserved for signalling and pilot data, which are needed by the system itself to ensure that it functions properly. Secondly, error control is used, which always results in some redundancy. The allocation of resources to multiple users on demand is ‘multiple access’, and is a feature of mobile communications system. For example, although 3G used a very different system from OFDMA (called Wideband Code Division Multiple Access, or WCDMA), it too depends on a limited resource being shared out among multiple users. The rapid review and revision of resource allocation as conditions change is a feature of mobile communications and known as scheduling.

4.6 WiFi WiFi, unlike mobile communication, has very limited mobility management, so is not regarded as mobile communication. It also operates at much lower power levels than mobile, giving a reduced range, and operates in unlicensed spectrum as opposed to the licensed spectrum allocated to mobile operators for their exclusive use. Anyone can use unlicensed spectrum (subject to certain restrictions), so WiFi has no guarantee of quality or reliability. Table 2 summarises some of the more significant versions of WiFi in chronological order. Table 2 Some versions of 802.11

Standard Year Band/GHz Channel(s)
/MHz Modulation method Highest modulation order Highest code rate MIMO streams Max transmission rate per MIMO stream
/Mb s^–1

802.11a 1999 2.4 22 OFDM 64 QAM 3/4 N/A 54

802.11b 1999 2.4 22 CCK QPSK 1/2 N/A 11

802.11g 2003 2.4 20 OFDM 64 QAM 3/4 N/A 54

802.11n 2009 2.4/5 20/40 OFDM 64 QAM 5/6 up to 4 streams total 65(in 20 MHz)

802.11ad 2012 60 2160 OFDM 64 QAM 13/16 N/A 6756.75

802.11ac 2014 5 20/40/
80/160 OFDM 256 QAM 5/6 up to 8 streams 180 (in 40 MHz; highest coding rate in 20 MHz = 3/4)

The ‘modulation method’ column shows that OFDM is standard in all versions apart from 802.11b. Not only has OFDM become standard, but the specification of OFDM has remained consistent. Subchannel width, is 312.5 kHz across all WiFi versions except 802.11ad. The maximum transmission rates shown in Table 2 need to be read in conjunction with the number of channels used, as the 802.11n and ac standards allow channels to be combined. Combining two adjacent 20 MHz channels to make a 40 MHz channel yields slightly more than double the benefit, because the guard band between the channels can be used for data. Similarly, an 80 MHz channel gives slightly more than double the benefit of a 40 MHz channel. 802.11n and ac can use Multiple Input Multiple Output (MIMO), which takes advantage of the multiple routes radio waves can take in a reflective environment (such as indoors) to enable multiple streams of data to be transmitted on the same frequency. Using MIMO increases the transmission rate in proportion to the number of spatial streams used. Table 2 gives maximum transmission rates for a single spatial stream. 802.11ad is clearly exceptional, both in the band it occupies (60 GHz) and in its channel width (2160 MHz). The appeal of the 60 GHz band lies in the large amount of licence-free spectrum available – hence the prospect of channels that are about 100 times wider than the 20 MHz channel width of, for example, 802.11g. With such wide channels, transmission rates of several gigabits per second become possible; thus 802.11ad is also known as ‘WiGig’ or gigabit-rate WiFi. A basic principle of the 802.11 standards is backwards compatibility, whereby later variants are compatible with earlier variants. For example, a device equipped for 802.11ac – which is a 5 GHz-only standard – will ‘fall back’ to earlier 5 GHz standards, namely 802.11a and n if there is a WiFi access point using one of those earlier standards. In those circumstances an 802.11ac device will perform no better than a device designed for the earlier standards.

4.7 WiFi maximum transmission rate The 20 MHz channel width of common versions of WiFi is divided into 64 OFDM sub-channels, each 312.5 kHz wide. However, not all 64 subchannels are used. For example in 802.11n the three subcarriers at the lower end of the channel and four subcarriers at the upper end are nulled (see Figure 30). This results in power being transmitted across a band approximately 17.8 MHz wide within the 20 MHz band. This is done because OFDM is noted for the high level of power that spills out on either side of the transmission band. Restricting transmission to a 17.8 MHz band ensures that most transmitted power stays within the 20 MHz channel.
Figure 30 Subchannel nulling used in an 802.11n channel
The subcarrier on the central frequency is also nulled, which enables receivers to locate the centre of the transmission band and for other reasons related to the way subcarriers are modulated and demodulated. The 8 nulled subchannels shown in Figure 30 leave 56 usable subchannels from the total of 64. Four of the 56 are used for control data, leaving 52 subchannels for user data. A symbol rate of 250 × 10³ symbols per second per subchannel is standard across the common WiFi versions. From this information the maximum transmission rate across all 52 subchannels of 802.11n to be calculated. The calculation is based on the use of the highest order of modulation, which is 64-QAM in 802.11n. This gives 6 bits per symbol. The number of bits per second across the 52 usable subchannels is: 52 × (250 × 10³ symbols s⁻¹) × 6 bits symbol⁻¹ = 78 Mbit s⁻¹ The highest code rate for 802.11n is 5/6. That is, only 5 bits in 6 are ‘useful’ data, the sixth bit being redundancy for error control. Hence the maximum transmission rate is: (5/6) × 78 Mbit s⁻¹ = 65 Mbit s⁻¹ This is the figure given in Table 2. A rate of 600 Mbit s⁻¹ is usually claimed for 802.11n. This is based on the use of 40 MHz of spectrum (slightly more than doubling the 65 Mbit s⁻¹calculated above) and four MIMO streams (quadrupling the transmission rate relative to a single stream).

4.8 WiFi multiple access Although WiFi uses OFDM, making it similar in some respects to 4G, the method of providing multiple access to the radio channel is very different from 4G’s. WiFi doesn’t use resource blocks. Also, the two directions of data traffic (towards and away from the user) are not differentiated by frequency band. The following activity shows how multiple access is provided in WiFi. Activity 21 Animation The following animation will explain how WiFi allocates access to the radio channel among its various users. Open it by clicking on the image or ‘View interactive version’ text below, then watch and listen to the six sections of the animation. This animation consists of six sections.Section 1 starts with a pictorial diagram of five devices. One is a rectangular box with two antennas at the back, which is labelled ‘access point (AP)’. The other four are a laptop computer, a tablet device, a smartphone and a wireless printer.A diagram of the 3G and 4G networks is then shown to illustrate cellular mobile communications. The figure shows 3G and 4G networks, with the 3G network on the left of the diagram and the 4G network on the right. Each network is divided by a horizontal dotted line, with the core network above the line and the access network below the line.In the 3G network there is a series of base stations at the bottom of the diagram in the access network area, called Node Bs. Above these, but still in the network access area, are four radio network controllers (RNCs), represented by rectangles. Each RNC has two or more Node Bs, or base stations, connected to it.Above the RNCs, in the core network area, are two rectangles labelled ‘serving GPRS support node (SGSN)’. There are various connections between the RNCs and the SGSNs, shown as blue lines. From the top of each RNC, there are two connections, one labelled ‘voice’ and one labelled ‘data’. The data connections go to the SGSN that is closest to the RNC in question. All the voice connections go to a rectangle labelled ‘voice switching’. This is in the core network.Above the SGSNs, also in the core network area, is a further rectangle labelled ‘gateway SGSN (GGSN)’. A curved arrow at the top of this rectangle indicates a connection to other networks.In the 4G network there is a series of base stations at the bottom of the diagram in the access network area, called eNode Bs. Above these, in the core network area, are two rectangles labelled ‘serving gateway (S-GW)’. To the left of these is a further rectangle, labelled ‘mobility management entity’. This has a direct connection, shown by a blue line, to one of the SGSNs in the 3G network.Above the two serving gateway rectangles is a further rectangle, labelled ‘packet data network’. A curved arrow at the top of this rectangle indicates a connection to other networks.All of the 4G network elements are within a cloud labelled ‘IP network’.The animation then returns to the five devices shown at the beginning. All five are labelled as nodes. After this, double-headed arrows are shown between the access point and each of the other four devices, indicating how this group of nodes could be a Basic Service Set.The access point is then shown by itself. Curved lines radiate out from each of the two antennas, labelled ‘Sarah’s network’.Section 2 of the animation shows an access point, a smartphone and a laptop computer, all with curved lines radiating from them at the same time. The curved lines overlap and clash with each other.The text ‘CSMA/CA’ and ‘Carrier Sense Multiple Access with Collision Avoidance’ is then shown on screen when this is mentioned by the narration.Next, four devices are shown on screen: an access point, two laptop computers and a smartphone. One laptop is shown with curved lines radiating out from it towards the access point. The other laptop and the phone are labelled as ‘monitoring the radio channel’.A laptop computer and a smartphone are then shown by themselves, each with a clock showing 00:00:12. They count down to 00:00:00 simultaneously. After this, new amounts of time are shown on each clock, but this time the two times are different: the phone has 00:00:25 and the laptop has 00:00:15. Again, they begin counting down together, but this time the laptop reaches 00:00:00 when the phone has only got down to 00:00:10.The laptop and phone are then shown next to the access point. The laptop, which has 00:00:00 on its clock, has concentric circles radiating from it to show that it is communicating with the access point. The phone, which has 00:00:10 on its clock, does nothing.Finally, the phone with 00:00:10 on its clock is shown by itself and counts down to 00:00:00, indicating that it would continue to count down from 00:00:10 next time round rather than start again with a new number.Section 3 of the animation starts by showing the laptop computer and smartphone from the previous section, each with a clock showing 00:00:12.After this, an access point and laptop computer are shown, with curved lines radiating out from the access point towards the laptop. A rectangle marked ‘frame’ is shown moving across from the access point to the laptop. The same access point and laptop are then shown with two other devices, another laptop and a smartphone. The original laptop has a clock showing 00:00:05, whereas the other laptop and the phone have clocks showing 00:00:12. They all start counting down simultaneously, but the first laptop reaches 00:00:00 when the other two have only got down to 00:00:07. Curved lines are then shown radiating from the first laptop to the access point, and a rectangle marked ‘ACK’ is shown moving across from the laptop to the access point.In Section 4, two houses are shown side by side. House 1 contains an access point, a smartphone and a laptop computer. House 2 contains an access point, a laptop computer and a wireless printer. Curved lines radiate from one device to another, as follows:the laptop to the access point in house 1the access point in all directions in house 2the printer to the access point in house 2the access point in all directions in house 1.Section 5 shows a screenshot of all the WiFi signals detected in a shopping centre. The signals are shown on a graph where the horizontal axis is labelled channel number from 1 to 11. The vertical axis is labelled signal strength in dBm, and ranges from minus 30 at the top of the axis to minus 90 at the bottom. Approximately 25 signals are shown, each with its own network address. The majority of the signals are clustered in three groups: one centred on channel 1, one centred on channel 6 and one centred on channel 11. There is little to no overlap between the three groups. Most of the signals have a flat top and very steeply sloping sides, but a few have sides that rise less steeply to a rounded peak.Another, similar graph is then shown, with channel numbers from 1 to 6 on the horizontal axis, and signal strength in dBm from minus 40 down to minus 100 on the vertical axis. This time only one WiFi signal is shown, centred on channel 1. It has a flat top and sides that are vertical nearer the top, then slope outward. A remote control signal then appears. This is centred on channel 3 and has the shape of a gently sloping peak. Between channel 1 and channel 3, the two signals overlap.Section 6, the final section of the animation, shows a horizontal axis marked with time in slots. The end of one frame is shown as a blue box at the left-hand end of the axis. The interval covering the next six time slots is labelled ‘check period’. The interval covering the 14 slots after that is labelled ‘contention window’ and the slots within it are labelled ‘back-off slots’. Finally, another blue box covering the next seven slots is labelled ‘next transmission’ and the left-hand edge of this box is labelled ‘data transmission begins’.Three frames are then shown, each represented as a grey rectangle. The three rectangles join together to make one long rectangle, and this is labelled ‘aggregated frame’.Finally, the animation returns to the diagram shown at the very beginning of Section 1, in which there are five nodes: an access point, a laptop computer, a tablet device, a smartphone and a wireless printer. Curved lines radiate out from the access point to all four of the other devices, representing infrastructure mode. Then curved lines radiate out from the tablet to the laptop, and from the phone to the printer, representing ad hoc mode.

4.9 Summary Orthogonal frequency division multiplexing (OFDM), which divides a communication channel into narrow subchannels (each with its own subcarrier) provides a spectrally efficient and flexible way to use the channel. OFDM has become an almost ubiquitous part of modulation, both in wired and wireless communication. The use of QAM modulation for the subcarriers of OFDM is also ubiquitous. Using various orders of QAM (depending on noise conditions) further adds to the flexibility of OFDM. In DSL broadband, a version of OFDM called Digital Multitone (DMT) is used both because of its spectral efficiency and its adaptability to unpredictable and variable noise conditions. In 4G, Orthogonal Frequency Division Multiple Access (OFDMA) is used as a flexible way to share access to a radio channel among multiple, transient users. This is done by grouping subchannels into units of resource which are allocated to users dynamically as fluctuating demand requires. In WiFi, OFDM is exploited as a way of making very efficient use of the available channels. The subchannels are not used for access (as in 4G), but selective nulling of subchannels is used to ensure that transmission power is properly restricted. Multiple access in WiFi is based on a ‘listen before transmitting’ protocol known as Carrier Sense Multiple Access/Collision Avoidance (CSMA/CA).
Conclusion The development of communication technology is, as much as anything, a story of the strategies for coping with adversity. ‘Adversity’ in this context means, for example, noise and interference, which are unavoidable and which cause error. It also refers to the unpredictable and uncontrollable characteristics of many communication channels, such as telephone lines and radio environments. Strategies for dealing with adversity here include the use of compression to minimise the amount of data that needs to be transmitted, the use of subchannels which enable particular parts of the frequency spectrum to be treated independently of others, and the variable orders of QAM, which enable data throughput to be tailored to prevailing conditions. All the topics in this free course, Exploring communications technology, and more, are covered in greater length and depth in the Open University course TM355 Communications technology, which can be studied as a stand-alone course or as part of the University’s Computing and IT BSc degree. Acknowledgments This free course was written by Allan Jones, David Chapman, Helen Donelan, Laurence Dooley, and Adrian Poulton. This free course includes adapted extracts from the course TM355 Communications technology. If you are interested in this subject and want to study formally with us, you may wish to explore other courses we offer in Technology. Except for third party materials and otherwise stated (see terms and conditions), this content is made available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 Licence. The material included with this course is entirely the property of the Open University. Don't miss out If reading this text has inspired you to learn more, you may be interested in joining the millions of people who discover our free learning resources and qualifications by visiting The Open University – www.open.edu/openlearn/free-courses.

Discussion 2024042601

Critical Sub-band	1	2	3	4	5	6	7	8	9	10	11	12
Output level/dB	18	14	42	58	12	5	10	8	6	1	4	2