Course content Course content

Exploring communications technology

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

More free courses

3.5 MPEG audio layer 3 (MP3)

In connection with frequency masking, we said that the masked sound B in Figure 3.5 did not need to be encoded. You might wonder how sounds can be selectively encoded if others are present at the same time. The answer is by splitting the audio band into sub-bands which are encoded separately. If a masked sound occupies a different sub-band from a masking sound, one can be ignored and the other encoded.

Figure 3.7 shows the elements of the creation of an MP3 audio file. The source input is generally assumed to be an audio data stream from either a CD (f_s = 44.1 kHz) or studio-recorded material (f_s = 48 kHz). The signal is filtered into 32 critical frequency sub-bands that are designed to reflect the way the ear perceives sounds.

Figure 3.7 MP3 encoder

Show description|Hide description

The diagram shows three rectangles depicting a block diagram of an MP3 encoder. There are two rectangles side by side at the top of the diagram. The rectangle on the left is labelled ‘32 critical frequency filter bank’. The rectangle on the right is labelled ‘allocate number of bits’, and above this rectangle is the label ‘VBR coder’.

The third rectangle is below and between the two upper rectangles. It is labelled ‘compute masking levels’. Beneath this rectangle there is also the label ‘psychoacoustic model’.

The inputs to the MP3 encoder are shown as a blue arrow that enters from the left and splits to point at the two rectangles labelled ‘32 critical frequency filter bank’ and ‘compute masking levels’.

The output of the ’32 critical frequency filter bank’ rectangle is shown as another blue arrow that splits to point at the two rectangles labelled ‘compute masking levels’ and ‘allocate number of bits’.

The output of the ‘compute masking levels’ rectangle is shown as another blue arrow that points at the ‘allocate number of bits’ rectangle.

Finally, the output of the ‘allocate number of bits’ rectangle is shown as a blue arrow pointing to the right. This is labelled ‘MP3 bitstream’.

Figure 3.7 MP3 encoder

The 32 critical sub-bands are sampled separately, yet this does not increase the total number of samples beyond what would be required if the audio band were not split into sub-bands. Sub-bands typically have a width of 750 Hz, for which the sampling theorem requires a minimum sampling rate of 2 × 0.75 kHz = 1.5 kHz. Therefore, across the 32 sub-bands, the minimum number of samples per second must be 32 × (1.5 × 10³) or 48 × 10³. This is exactly the same as for a single band with a total bandwidth of 32 × 0.75 kHz = 24 kHz, for which the sampling theorem requires the minimum sampling rate to be 48 kHz.

Once the source signal has been split into critical sub-bands, the next step is to determine the amount of masking in each sub-band and its effect on adjacent bands – the so-called mask-to-noise ratio (MNR). This makes extensive use of the two psychoacoustic masking effects of the ear discussed above to govern the appropriate quantisation levels to be used in each different frequency sub-band. Collectively these define the masking threshold, which determines which frequencies will and will not be coded.

If the signal level in a sub-band is below the masking threshold, it is not encoded; if it is above the threshold, it will be coded using variable bit-rate coding (VBR). In VBR, the number of bits allocated to represent each frequency component is based upon the level of quantisation noise. In digital audio, the S/N ratio is approximately equivalent to ~6 dB bit⁻¹ so the more bits allocated the higher the S/N ratio.

As an example, Table 3.1 shows the output levels of the first 12 critical sub-bands at a specific instant for an MP3 encoder. The output levels indicate the extent to which the level in any particular sub-band exceeds the threshold of hearing in that sub-band. If the output level were 0 in any sub-band, encoding would not be required in that sub-and because the output level would be on the threshold of audibility.

Table 3.1: Outputs from a sub-band MP3 encoder filter
Critical Sub-band	1	2	3	4	5	6	7	8	9	10	11	12
Output level/dB	18	14	42	58	12	5	10	8	6	1	4	2

Sub-band 4 has a high output level of 58 dB. Suppose this produces an effective masking threshold of 16 dB to sub-band 5. As 16 dB exceeds sub-band 5’s output level of 12 dB, sub-band 5 does not need to be encoded in the time period covered by these output levels.

Activity 3.3 Self assessment

Suppose sub-band 4 produces an effective masking threshold of 20 dB to sub-band 3. Does sub-band 3 need to be encoded?

Answer

The output level of sub-band 3 is 42 dB, which is above the masking threshold of 20 dB provided by sub-band 4, so this sub-band needs to be encoded.

Previous 3.4 Perceptual encoding

Next MP3 continued

Take your learning further

Making the decision to study can be a big step, which is why you’ll want a trusted University. We’ve pioneered distance learning for over 50 years, bringing university to you wherever you are so you can fit study around your life. Take a look at all Open University courses.

If you’re new to university-level study, read our guide on Where to take your learning next, or find out more about the types of qualifications we offer including entry level Access modules, Certificates, and Short Courses.

Want to achieve your ambition? Study with us and you’ll be joining over 2 million students who’ve achieved their career and personal goals with The Open University.

Browse all Open University courses

My OpenLearn Profile

About this free course

Become an OU student

Download this course

Share this free course

3.5 MPEG audio layer 3 (MP3)

Activity 3.3 Self assessment

Answer