3.6 MPEG-4 AAC (advanced audio coding)
MPEG-4 AAC (advanced audio coding) was designed as the successor to MP3 for low-bit-rate perceptual audio compression, with efficient internet multimedia streaming applications in mind. Its development was also motivated by the quest for efficient coding of multichannel surround-sound signals. So-called ‘5.1 surround sound’ includes five full bandwidth channels (left, right, centre, left surround and right surround), with the ‘point 1’ referring to a dedicated low frequency effect (LFE) channel carrying bass information in the 3 to 120 Hz band.
AAC has now been formally embedded in both the MPEG-2 and MPEG-4 audio standards; it is the default format for various multimedia applications and services, from YouTube to Apple’s iTunes. The broad consensus is that, subjectively, the AAC encoder (.mp4 files) provides better audio quality for the same bit rate as MP3, with greater flexibility and functionality. In comparison with MP3, AAC offers a range of sampling rates up to 96 kHz, and also supports up to 48 channels (mono, stereo and multichannel surround sound). In terms of coding, it uses either 2048 or 256 sub-bands compared to 32 for MP3, thus providing better frequency resolution for the psychoacoustic modelling and perceptual masking steps.
Another noteworthy feature of AAC encoders is that audio files do not have to be encoded at a specific streaming speed. Instead the file is coded once, then streamed at a variable bit rate depending on the connection speed and network traffic conditions. This is a consequence of AAC supporting scalable representations in terms of sample amplitudes (or S/N ratio) and sampling rates.
MPEG-4 AAC and its variants excel at low bit rates by virtue of a series of extensions and tools that have evolved and subsequently become embedded into the standard. Figure 3.8 identifies three key tools that have been instrumental in the advancement of this standard:
perceptual noise substitution (PNS)
spectral band replication (SBR)
parametric stereo (PS)
Further information on each of these is readily available on the Web. While each tool to some extent adds complexity to the encoder, it also provides notable improvements in coding efficiency and corresponding audio quality.
AAC-LC (low complexity) is the most widely used coding profile in this standard, and the default format for Apple’s iTunes. Since AAC involves many varied processes in analysing different types of audio signal, no single algorithm is able to meet the diverse set of requirements it must fulfil. Therefore AAC has integrated different applications into a single framework covering music synthesis, low-bit-rate speech coding, text-to-speech synthesis and general perceptual audio compression across a host of different bit rates.
The most recent AAC extension is High-Efficiency AAC (HE-AAC) also known as AACplus. It is specifically optimised for very-low-bit-rate applications such as audio streaming and podcasting, and is now the standard technology used in digital radio broadcasting. It embraces SBR technology to encode and store high frequency information as part of the standard, and is able to deliver near-CD quality sound at 64 kbit s−1. At the time of writing, the most recent version is HE-AAC version 2, which employs the third major extension in Figure 3.8 – parametric stereo (PS) – to improve the audio quality at low bit rates and increase compression by up to 40%. This analyses the spatial characteristics between the left and right channels of a stereo signal to exploit inter-channel redundancies. PS characterises the inter-channel features of the stereo signal and, depending on the source, typically provides a bit-rate saving of up to a factor of 10.
Activity 3.5 Exploratory
This ‘Audio coding’ activity allows you to compare several versions of the same audio sample that have been compressed using different standards.
In this activity you will hear a sample of speech that has been processed with different compression formats. In the order in which you will hear the speech samples, the formats used are the following four:
- AAC LC
- HE-AAC v1
- HE-AAC v2.
This is theoretically the order of increasing quality.
All four extracts are at a bit rate of 16 kbit s−1. This low bit rate has been chosen to emphasise the differences in quality between the formats, which are less noticeable at higher bit rates.
The speech extract used consists of the following two sentences:
In my garden I have an apple tree, a hazel tree and a pine tree. My neighbours have an apple tree too.
With each repetition the quality should improve, although many people find little difference between the second and third versions (AAC LC and HE-AAC v1).
Play the audio clip now.
Since the greatest difference is between the first and last extracts in the above sample, the following sample uses just those extracts (that is, MP3 followed by HE-AAC v2).