Course content Course content

Exploring communications technology

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

More free courses

3.6 MPEG-4 AAC (advanced audio coding)

MPEG-4 AAC (advanced audio coding) was designed as the successor to MP3 for low-bit-rate perceptual audio compression, with efficient internet multimedia streaming applications in mind. Its development was also motivated by the quest for efficient coding of multichannel surround-sound signals. So-called ‘5.1 surround sound’ includes five full bandwidth channels (left, right, centre, left surround and right surround), with the ‘point 1’ referring to a dedicated low frequency effect (LFE) channel carrying bass information in the 3 to 120 Hz band.

AAC has now been formally embedded in both the MPEG-2 and MPEG-4 audio standards; it is the default format for various multimedia applications and services, from YouTube to Apple’s iTunes. The broad consensus is that, subjectively, the AAC encoder (.mp4 files) provides better audio quality for the same bit rate as MP3, with greater flexibility and functionality. In comparison with MP3, AAC offers a range of sampling rates up to 96 kHz, and also supports up to 48 channels (mono, stereo and multichannel surround sound). In terms of coding, it uses either 2048 or 256 sub-bands compared to 32 for MP3, thus providing better frequency resolution for the psychoacoustic modelling and perceptual masking steps.

Another noteworthy feature of AAC encoders is that audio files do not have to be encoded at a specific streaming speed. Instead the file is coded once, then streamed at a variable bit rate depending on the connection speed and network traffic conditions. This is a consequence of AAC supporting scalable representations in terms of sample amplitudes (or S/N ratio) and sampling rates.

MPEG-4 AAC and its variants excel at low bit rates by virtue of a series of extensions and tools that have evolved and subsequently become embedded into the standard. Figure 3.8 identifies three key tools that have been instrumental in the advancement of this standard:

perceptual noise substitution (PNS)

spectral band replication (SBR)

parametric stereo (PS)

Further information on each of these is readily available on the Web. While each tool to some extent adds complexity to the encoder, it also provides notable improvements in coding efficiency and corresponding audio quality.

Figure 3.8 MPEG-AAC audio encoder family

Show description|Hide description

The figure shows three squares in a row, representing the key tools that have been instrumental in the advancement of the MPEG-4 AAC family of standards. From left to right, these are perceptual noise substitution (PNS), spectral band replication (SBR) and parametric stereo (PS).

Above the three squares is a row of four rectangles. From left to right, these are labelled MPEG-2 AAC-LC, MPEG-4 AAC-LC, MPEG-4 HE-AAC and MPEG-4 HE-AAC v2. Between each pair of rectangles is a short horizontal arrow pointing to the right, so three arrows in total.

The three squares are connected to the three arrows between the four rectangles. The connections are shown by three dashed vertical lines.

Figure 3.8 MPEG-AAC audio encoder family

AAC-LC (low complexity) is the most widely used coding profile in this standard, and the default format for Apple’s iTunes. Since AAC involves many varied processes in analysing different types of audio signal, no single algorithm is able to meet the diverse set of requirements it must fulfil. Therefore AAC has integrated different applications into a single framework covering music synthesis, low-bit-rate speech coding, text-to-speech synthesis and general perceptual audio compression across a host of different bit rates.

The most recent AAC extension is High-Efficiency AAC (HE-AAC) also known as AACplus. It is specifically optimised for very-low-bit-rate applications such as audio streaming and podcasting, and is now the standard technology used in digital radio broadcasting. It embraces SBR technology to encode and store high frequency information as part of the standard, and is able to deliver near-CD quality sound at 64 kbit s⁻¹. At the time of writing, the most recent version is HE-AAC version 2, which employs the third major extension in Figure 3.8 – parametric stereo (PS) – to improve the audio quality at low bit rates and increase compression by up to 40%. This analyses the spatial characteristics between the left and right channels of a stereo signal to exploit inter-channel redundancies. PS characterises the inter-channel features of the stereo signal and, depending on the source, typically provides a bit-rate saving of up to a factor of 10.

Activity 3.5 Exploratory

This ‘Audio coding’ activity allows you to compare several versions of the same audio sample that have been compressed using different standards.

In this activity you will hear a sample of speech that has been processed with different compression formats. In the order in which you will hear the speech samples, the formats used are the following four:

MP3
AAC LC
HE-AAC v1
HE-AAC v2.

This is theoretically the order of increasing quality.

All four extracts are at a bit rate of 16 kbit s⁻¹. This low bit rate has been chosen to emphasise the differences in quality between the formats, which are less noticeable at higher bit rates.

The speech extract used consists of the following two sentences:

In my garden I have an apple tree, a hazel tree and a pine tree. My neighbours have an apple tree too.

With each repetition the quality should improve, although many people find little difference between the second and third versions (AAC LC and HE-AAC v1).

Play the audio clip now.

Download this audio clip.Audio player: tm355_bk2_pt3_oa3-11_a001.mp3

Download

Interactive feature not available in single page view (see it in standard view).

Since the greatest difference is between the first and last extracts in the above sample, the following sample uses just those extracts (that is, MP3 followed by HE-AAC v2).

Download this audio clip.Audio player: tm355_bk2_pt3_oa3-11_a002.mp3

Download

Interactive feature not available in single page view (see it in standard view).

Previous MP3 continued

Next 3.7 Image and video compression

Take your learning further

Making the decision to study can be a big step, which is why you’ll want a trusted University. We’ve pioneered distance learning for over 50 years, bringing university to you wherever you are so you can fit study around your life. Take a look at all Open University courses.

If you’re new to university-level study, read our guide on Where to take your learning next, or find out more about the types of qualifications we offer including entry level Access modules, Certificates, and Short Courses.

Want to achieve your ambition? Study with us and you’ll be joining over 2 million students who’ve achieved their career and personal goals with The Open University.

Browse all Open University courses

My OpenLearn Profile

About this free course

Become an OU student

Download this course

Share this free course

3.6 MPEG-4 AAC (advanced audio coding)

Activity 3.5 Exploratory