3.4 Perceptual encoding
Sounds of certain frequencies or certain colours are perceived better than others. Useful reductions of file size or data rate can often be achieved if this fact is exploited during encoding of the source. MP3 music files for example are typically one-tenth of the size of equivalent, uncompressed music files (such as CD files).
Humans are more sensitive to frequencies in the range 1 to 5 kHz than to those outside this range. This is shown Figure 3.4. The red line is the threshold of hearing. Sounds below the threshold are inaudible. The threshold is lowest between 1 and 5 kHz. It rises above 5 kHz and below 1 kHz. At these frequencies, the quietest audible sounds are louder than the quietest audible sounds between 1 kHz and 5 kHz.
In Figure 3.4, two single-frequency tones A and B are shown with the same amplitude, but A is audible and B is inaudible.
A relatively loud sound at a particular frequency reduces our sensitivity to neighbouring frequencies. This is frequency masking. Figure 3.5 shows a loud sound A raising the perceptual hearing threshold in its vicinity. Sound B, which would otherwise be audible, is made inaudible. Under these circumstances, it would be unnecessary to encode sound B.
Another form of masking is temporal masking. This arises because our sensitivity to sounds in a narrow frequency range is reduced for a short period before and after the presence of a relatively strong sound in that frequency range. You may be surprised that sensitivity can be reduced before as well as after a relatively loud sound. This is a result of the way the auditory system and brain process audio information.
Following a loud sound, it takes the ear up to 50 ms to be able to respond again to a much quieter sound. The resulting temporal masking envelope is displayed in Figure 3.6. The shaded region represents inaudible signal amplitudes following a very strong signal at time T.