Course content Course content

Exploring communications technology

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

More free courses

3.4 Perceptual encoding

Sounds of certain frequencies or certain colours are perceived better than others. Useful reductions of file size or data rate can often be achieved if this fact is exploited during encoding of the source. MP3 music files for example are typically one-tenth of the size of equivalent, uncompressed music files (such as CD files).

Humans are more sensitive to frequencies in the range 1 to 5 kHz than to those outside this range. This is shown Figure 3.4. The red line is the threshold of hearing. Sounds below the threshold are inaudible. The threshold is lowest between 1 and 5 kHz. It rises above 5 kHz and below 1 kHz. At these frequencies, the quietest audible sounds are louder than the quietest audible sounds between 1 kHz and 5 kHz.

In Figure 3.4, two single-frequency tones A and B are shown with the same amplitude, but A is audible and B is inaudible.

Figure 3.4 Hearing sensitivity threshold response curve of the human ear with two equal-amplitude frequency tones, A and B

Show description|Hide description

This line graph has a Y axis labelled ‘relative signal amplitude’, measured in decibels, and has points minus 100 (bottom) and 0 (top) marked.

The X axis is labelled as frequency, measured in kilohertz. The scale is not linear, more of a logarithmic scale. From left to right it has points 0.02, 1, 5 and 20 marked.

A smooth analogue waveform is shown, starting at a position 0 on the Y axis. It is labelled ‘hearing sensitivity threshold’. It curves and drops smoothly to a trough at position 1 on the X axis and minus 100 on the Y axis. It remains at minus 100 until about position 5 on the X axis, then rises sharply (note also the logarithmic scale) to a maximum of 0 on the Y axis at position 20 on the X axis.

Two blue arrows labelled A and B are shown, starting at minus 100 on the Y axis, both pointing upwards. B is positioned just to the right of position 0.02 on the X axis. A is positioned to the right of B. The two arrows are of equal amplitude (height). B does not cross the hearing sensitivity threshold curve, but A does.

Figure 3.4 Hearing sensitivity threshold response curve of the human ear with two equal-amplitude frequency tones, A and B

A relatively loud sound at a particular frequency reduces our sensitivity to neighbouring frequencies. This is frequency masking. Figure 3.5 shows a loud sound A raising the perceptual hearing threshold in its vicinity. Sound B, which would otherwise be audible, is made inaudible. Under these circumstances, it would be unnecessary to encode sound B.

Figure 3.5 Frequency masking for two single-tone frequencies, A and B, with A louder than B

Another form of masking is temporal masking. This arises because our sensitivity to sounds in a narrow frequency range is reduced for a short period before and after the presence of a relatively strong sound in that frequency range. You may be surprised that sensitivity can be reduced before as well as after a relatively loud sound. This is a result of the way the auditory system and brain process audio information.

Following a loud sound, it takes the ear up to 50 ms to be able to respond again to a much quieter sound. The resulting temporal masking envelope is displayed in Figure 3.6. The shaded region represents inaudible signal amplitudes following a very strong signal at time T.

Figure 3.6 Temporal masking effect of a loud sound at T and resulting inaudible envelope

Show description|Hide description

This line graph has a Y axis labelled ‘relative signal amplitude’, measured in decibels, and has points minus 80 (bottom) and 0 (top) marked.

The X axis is labelled as time, measured in milliseconds. It has points T and T plus 50 marked. A long vertical arrow, pointing upwards, is positioned at T, extending from Y equals minus 80 to Y equals 0. Four short, parallel vertical arrows are shown close to the T plus 50 position. The first is about one quarter of the length of the long arrow, the second about half the length, the third about a third of the length and the fourth about one quarter of the length of the long arrow.

The waveform starts at position minus 80 on the Y axis, at a position just before T on the X axis. It rises as a straight diagonal line, reaching its peak value of Y equals 0 just after time T. It then remains constant at a relative signal amplitude of 0 decibels until approximately T plus 30. It then drops as a smooth curve to a minimum at T plus 50.

The area under the waveform from just before time T through to time T plus 50 is shaded and labelled ‘inaudible signal amplitudes’.

Figure 3.6 Temporal masking effect of a loud sound at T and resulting inaudible envelope

Previous 3.3 Quantisation

Next 3.5 MPEG audio layer 3 (MP3)

Take your learning further

Making the decision to study can be a big step, which is why you’ll want a trusted University. We’ve pioneered distance learning for over 50 years, bringing university to you wherever you are so you can fit study around your life. Take a look at all Open University courses.

If you’re new to university-level study, read our guide on Where to take your learning next, or find out more about the types of qualifications we offer including entry level Access modules, Certificates, and Short Courses.

Want to achieve your ambition? Study with us and you’ll be joining over 2 million students who’ve achieved their career and personal goals with The Open University.

Browse all Open University courses

My OpenLearn Profile

About this free course

Become an OU student

Download this course

Share this free course

3.4 Perceptual encoding