Course content Course content

Exploring communications technology

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

More free courses

MP3 continued

The last activity showed that sub-band 3 is not, in this instance, masked by the loud sound in sub-band 4. However, the raising of the threshold by 20 dB means that for encoding purposes sub-band 3’s output level is reduced. Specifically, as the output level of sub-band 3 exceeds the threshold by (42 – 20) dB, the effective level that needs to be encoded is only 22 dB. In the VBR encoding used in MP3, 1 bit is allocated per 6 dB of level above the threshold. This means that sub-band 3, which exceeds the threshold by 22 dB needs an allocation of 4 bits to encode this sample. An allocation of 3 bits would be insufficient as 3 × 6 dB = 18 dB, which is below the level of 22 dB, whereas 4 × 6 dB = 24 dB, which is above 22 dB.

The procedure outlined here has to be carried out across all sub-bands where there is frequency masking from other sub-bands. The effect of temporal masking also has to be taken into account. These processes have to be repeated to cover the entire duration of the recording.

MP3 achieves high-quality audio reproduction at 128 kbit s⁻¹. This contrasts markedly with the CD bit rate of 1.4112 Mbit s⁻¹. MP3 generally achieves 10:1 compression without introducing notable subjective effects into the reconstructed sound. Incidentally, it is common to refer to compressed audio files in terms of a bit rate in kbit s⁻¹ or Mbit s⁻¹ rather than as an actual file size. The reason for this convention is that MP3 and other audio formats are extensively used in streaming applications where the emphasis is on throughput and quality of service (QoS) rather than storage capacity.

The majority of MP3 recordings are made at 128 kbit s⁻¹, which provides sufficient audio quality that the majority of people (apart from hi-fi buffs) would not notice the difference. As the bit rate drops to 64 kbit s⁻¹, however, the loss becomes much more perceptible at the top (treble) end. The bass response also tends to degrade, and higher frequencies take on a distinctly artificial digital tone. The reason for this is that the MP3 developers decided to limit the audio bandwidth to approximately 16 kHz for 128 kbit s⁻¹ and only approximately 8 kHz for 64 kbit s⁻¹.

Activity 3.4 Exploratory

This activity: ‘Perceptual sensitivity and masking’ allows you to explore some audio examples of the relative hearing sensitivity response of the ear, as well as frequency and temporal perceptual masking effects.

Frequency masking

To demonstrate frequency masking, you will hear a relatively loud sine-wave tone (440 Hz) masking a quieter tone at a different frequency (652 Hz). The image below provides a visual representation of the audio clip: the horizontal direction represents time, the vertical direction represents amplitude, and the green shapes are the envelopes of the sine-wave tones. The sine waves are too closely packed for their cycles to be visible. You will probably find it helpful to look at this image while you play the audio clip.

Maximise

Show description|Hide description

The diagram is a screen shot from audio-editing software showing a graphical display of sounds. In this display the horizontal axis represents time. The sounds are shown as solid blocks of green. The height of the block represents the sound’s amplitude and its width represents its duration. Each block of sound is labelled with a letter. The letters go from A to I.

The first sound shown is a tall block of short duration, labelled A. This is two seconds of the 440 hertz masking tone. After this sound there is a short period of silence, and then a very shallow sound labelled B. This is two seconds of the quiet 652 hertz masked tone. This is followed by a short gap of silence.

The next sound is as tall as the first and is labelled C. This represents four seconds of the 440 hertz masking tone. This merges into a very slightly taller section labelled D, which consists of both the 440 hertz masking tone and the quiet 652 hertz masked tone. After four seconds the amplitude of this sound decreases over a period of about four seconds. This period of decreasing amplitude is labelled E. The amplitude decreases to the level of B, the 652 hertz masked tone, indicating that the 440 hertz masking tone has been reduced to zero, leaving just the 652 hertz masked tone.

The next section is F, and merges with E before it. It represents the 652 hertz masked tone. After four seconds of this, the amplitude of the block gradually gets bigger, indicating that the 440 hertz masking tone is brought back in and increases in volume. This part is labelled G. The amplitude reaches the same value as section D, and stays at this amplitude for four seconds. This section is labelled H. After four seconds the amplitude drops very slightly, indicating that the 652 hertz tone has dropped out leaving just the 440 hertz masking tone. This section is labelled I.

The first two sounds in the clip are simply to familiarise you with the masking tone (440 Hz, shown in the figure at A) and quieter masked tone (652 Hz, shown at B). You will hear 2 seconds of each.

The masking demonstration follows. There are 4 seconds of the 440 Hz masking tone (C). The quieter 652 Hz tone is then added, and the two tones are played for 4 seconds (D). When you play the audio clip, try to identify whether you can hear the 652 Hz tone during this part.

The 440 Hz masking tone then fades out (E), leaving just the 652 Hz tone for 4 seconds (F). Finally, the 440 Hz masking tone gradually fades back in (G) and should eventually mask the 652 Hz tone (H). The audio clip ends with 4 seconds of just the 440 Hz tone again (I).

Play the audio clip now.

Download this audio clip.Audio player: tm355_bk2_pt3_oa3-10_a001.mp3

Download

Show description|Hide description

I heard a relatively loud single tone followed by a quieter, higher tone. When the two tones were played together, I heard only the louder, lower tone. However, as the lower tone decreased in volume, the quieter, higher tone seemed to fade in until I could hear it clearly. Similarly, as the lower tone faded back in and grew louder again, the higher tone seemed to fade out until I was no longer able to hear it. Thus although the quieter, higher tone was playing all along, I was only able to perceive it when the louder, lower tone was sufficiently quiet.

Interactive feature not available in single page view (see it in standard view).

Temporal masking

In temporal masking, a loud sound makes a closely following sound inaudible. The effect is most noticeable when the following sound is relatively quiet, and when it follows after a very short gap. The image below shows the sequence of sounds used in each of the demonstration audio clips.

Show description|Hide description

The diagram shows two blocks of sound, with a gap between them. The first block of sound has a large amplitude and a relatively long duration. Its duration is actually one second, but it occupies most of the diagram. This block is labelled ‘loud 632 hertz tone’. After this comes a short gap of silence labelled ‘gap’. This is followed by a short, low-amplitude burst of sound labelled ‘quieter 632 hertz tone’.

In each of the four audio clips below, a relatively long, large-amplitude 632 Hz tone is followed by a gap, and then a quieter version of the same tone.

In the first clip, the gap between the two tones is fairly long (60 ms). The tone after the gap is audible as a very short blip (like a faint echo) after the main tone.

In successive clips, the gap gets shorter. You should find that the final blip becomes inaudible as the gap decreases to 10 ms.

Download this audio clip.Audio player: Gap = 60 ms

Download

Gap = 60 ms

Interactive feature not available in single page view (see it in standard view).

Download this audio clip.Audio player: Gap = 40 ms

Download

Gap = 40 ms

Interactive feature not available in single page view (see it in standard view).

Download this audio clip.Audio player: Gap = 20 ms

Download

Gap = 20 ms

Interactive feature not available in single page view (see it in standard view).

Download this audio clip.Audio player: Gap = 10 ms

Download

Gap = 10 ms

Interactive feature not available in single page view (see it in standard view).

Previous 3.5 MPEG audio layer 3 (MP3)

Next 3.6 MPEG-4 AAC (advanced audio coding)

Take your learning further

Making the decision to study can be a big step, which is why you’ll want a trusted University. We’ve pioneered distance learning for over 50 years, bringing university to you wherever you are so you can fit study around your life. Take a look at all Open University courses.

If you’re new to university-level study, read our guide on Where to take your learning next, or find out more about the types of qualifications we offer including entry level Access modules, Certificates, and Short Courses.

Want to achieve your ambition? Study with us and you’ll be joining over 2 million students who’ve achieved their career and personal goals with The Open University.

Browse all Open University courses

My OpenLearn Profile

About this free course

Become an OU student

Download this course

Share this free course

MP3 continued

Activity 3.4 Exploratory

Frequency masking

Temporal masking