Printable page generated Wednesday, 21 Aug 2019, 11:43

Hearing

Introduction

This course examines the basic mechanisms responsible for our ability to hear. Humans are able to distinguish a remarkable range of sounds and hearing provides us with a unique source of information about what is occurring in our immediate surroundings. Our sense of hearing depends entirely on the sensory receptors of the inner ear known as hair cells. Hair cells are extremely vulnerable and can be affected by disease, ageing and over-exposure to loud noise. Once destroyed, they do not regenerate. In this course we describe in detail the function of the cochlea, which is where the hair cells are located. We learn how sound energy is transduced into electrical signals and how a rapid-fire code of electrical impulses about the physical characteristics of a particular sound is sent to the brain. The brain interprets these signals as a musical phrase, a human voice or any of the range of sounds in the world around us at a particular moment. We also examine the central auditory nervous system pathways and describe the physiological mechanisms responsible for our sense of pitch and loudness and our ability to localise the source of a sound stimulus. Finally, we look at the main types of hearing impairment and their causes, effects and rehabilitation.

This OpenLearn course is an adapted extract from the Open University course SD329 Signals and perception: the science of the senses.

Learning outcomes

After studying this course, you should be able to:

  • distinguish between the major anatomical components of the outer, middle and inner ear

  • describe the function of the outer, middle and inner ear

  • describe the structure of the cochlea

  • describe the structural arrangements of the organ of Corti and the function of the basilar membrane

  • decribe the main causes of hearing impairments and the methods used to rehabilitate hearing-impaired individuals.

1 Sound reception: the ear

In order to hear a sound, the auditory system must accomplish three basic tasks. First it must deliver the acoustic stimulus to the receptors; second, it must transduce the stimulus from pressure changes into electrical signals; and third, it must process these electrical signals so that they can efficiently indicate the qualities of the sound source such as pitch, loudness and location. How the auditory system accomplishes these tasks is the subject of much of the rest of this block. We will begin by describing the basic structure of the ear, which carries out the first of the three tasks.

The human ear can be divided into three fairly distinct components according to both anatomical position and function: the outer ear, which is responsible for gathering sound energy and funnelling it to the eardrum; the middle ear which acts as a mechanical transformer; and the inner ear where the auditory receptors (hair cells) are located.

2 Structure and function

2.1 Structure and function of the outer and middle ear

Figure 1 is a diagram of the human ear. The outer ear consists of the visible part of the ear or pinna, the external auditory canal (meatus), and the tympanic membrane (tympanum) or eardrum. The human pinna is formed primarily of cartilage and is attached to the head by muscles and ligaments. The deep central portion of the pinna is called the concha, which leads into the external auditory canal, which in turn leads to the tympanic membrane.

Figure 1
Figure 1 The human ear showing the outer, middle and inner ears

Only mammals have pinnae and only some have mobile pinnae. The pinnae of humans and primates have no useful muscles and are therefore relatively immobile. Mobile, and to some extent, immobile pinnae help in localising sounds by funnelling them towards the external canal.

Activity

How would the immobility of our pinnae affect how we localise a sound source?

Answer

Unlike animals with mobile pinnae, we must reposition our head in order to aim our ears at a sound source.

The pinnae also help in distinguishing between noises originating in front of and behind the head, and in providing other types of filtering of the incoming sound wave. In addition, the concha and external auditory canal effectively enhance the intensity of sound that reaches the tympanic membrane by about 10 to 15 dB. This enhancement is most pronounced for sounds in the frequency range of roughly 2 to 7 kHz and so, in part, determines the frequencies to which the ear is most sensitive. Finally, the outer ear protects the tympanic membrane against foreign bodies and changes in humidity and temperature.

The external auditory canal extends about 2.5 cm inside the skull before it ends in the tympanic membrane. Sound travels down the meatus and causes the tympanic membrane to vibrate. The tympanic membrane is thin and pliable so that a sound, consisting of compressions and rarefactions of air particles, pulls and pushes at the membrane moving it inwards and outwards at the same frequency as the incoming sound wave. It is this vibration that ultimately leads to the perception of sound. The greater the amplitude of the sound waves, the greater the deflection of the membrane. The higher the frequency of the sound, the faster the membrane vibrates.

On the other side of the tympanic membrane is the middle ear (Figure 1) which is an air-filled chamber containing three interlocking bones called ossicles. These are the smallest bones in the body and function to transmit the vibrations caused by auditory stimulation at the tympanic membrane to the inner ear. The bones are called the malleus (Latin for ‘hammer’), the incus (‘anvil’) and the stapes (‘stirrup’). The ossicle attached to the tympanic membrane is the malleus, which forms a rigid connection with the incus. The incus forms a flexible connection with the stapes. The flat bottom portion of the stapes, the footplate, is connected to the oval window (a second membrane covering a hole in the bone of the skull). In response to sound, the inward-outward movement of the tympanum displaces the malleus and incus and the action of these two bones alternately drives the stapes deeper into the oval window and retracts it, resulting in a cyclical movement of fluid within the inner ear.

This may seem a complex way to transmit vibrations of the tympanic membrane to the oval window. Why must they be transmitted via the ossicular chain and not simply transferred directly?

The reason is that the middle ear cavity is air-filled while the inner ear is fluid-filled. The passage of sound information from the outer to the inner ear involves a boundary between air and fluid. If you have tried talking to someone who is under water, you may have observed that sound does not travel efficiently across this kind of interface. In fact, 99.9 per cent of the sound energy incident on an air/fluid boundary is reflected back within the air medium and only 0.1 per cent is transmitted to the fluid. Therefore, if sound waves were to impinge directly on the oval window, the membrane would barely move. Most of the sound would be reflected back because the fluid in the inner ear is denser than air and resists being moved much more than air does. Consequently, in order to drive the movement of the oval window and vibrate the fluid, greater pressure is needed.

The middle ear provides two ways of doing this. The first is to do with the relative sizes of the tympanic membrane and the stapes footplate (which is connected to the oval window). Measurements have shown that the area of the tympanic membrane that vibrates in response to high intensity sound is 55 mm2. The stapes footplate which makes contact with the oval window has an area of only about 3.2 mm2. So, if all the force exerted on the tympanic membrane is transferred to the stapes footplate, then the pressure (force per course area) must be greater at the footplate because it is smaller than the tympanic membrane. One rather painful demonstration of this principle is to compare the pressure exerted on your toe by someone wearing a stiletto heel compared to the pressure exerted by the same person wearing an ordinary trainer.

The second way in which the middle ear ossicles transfer the force from the tympanic membrane to the stapes footplate is through the lever action of the ossicles. Figure 2 shows how a lever system can increase the force of an incoming signal.

Figure 2
Figure 2 The lever action of the middle ear. The length of the malleus corresponds to D1, the distance between the applied force and the fulcrum. The length of the incus corresponds to D2, the distance between the fulcrum and the resultant force. If D2 is less than D1, then the resultant force will be greater than the applied force

The middle ear has another function in addition to the mechanical transformation of the auditory signal. When the auditory system is subjected to very loud sounds that are potentially harmful to the inner ear, two set of muscles, the tensor tympani and the stapedius muscles, contract and in so doing reduce the magnitude of the vibration transmitted through the middle ear. The response of these muscles to loud noises is known as the acoustic or middle ear reflex.

3 The structure and function of the inner ear

3.1 Introduction

The inner ear (Figure 3) can be divided into three parts: the semicircular canals, the vestibule and the cochlea, all of which are located in the temporal bone. The semicircular canals and the vestibule affect the sense of balance and are not concerned with hearing. However, the cochlea, and what goes on inside it, provides the key to understanding many aspects of auditory perception and will therefore be dealt with in some detail.

Figure 3
Figure 3 The inner ear showing the semicircular canals and the cochlea

3.2 The anatomy of the cochlea

The cochlea has a spiral shape resembling the shell of a snail (Figure 4a). You can approximate the structure of the cochlea by wrapping a drinking straw 2.5 times around the tip of a sharpened pencil. The hollow tube, represented by the straw, has walls made of bone and the central pillar of the cochlea, represented by the pencil, is a conical bony structure called the modiolus. Unravelled (Figure 4b), the cochlea's hollow tube is about 32 mm long and 2 mm in diameter. The tube of the cochlea is divided into three chambers: the scala vestibuli, the scala media (or cochlear duct) and the scala tympani. The three scalae wrap around inside the cochlea like a spiral staircase (‘scala’ is Latin for ‘stairway’). The scala vestibuli forms the upper chamber and at the base of this chamber is the oval window. The lowermost of the three chambers is the scala tympani. It too has a basal aperture, the round window, which is closed by an elastic membrane. The scala media or cochlear duct separates the other two chambers along most of their length. The start of the cochlea, where the oval and round windows are located is known as the basal end, while the other end, the inner tip is known as the apical end (or apex). The scala vestibuli and the scala tympani communicate with one another via the helicotrema, an opening in the cochlear duct at the apex. Both scala vestibuli and scala tympani are filled with the same fluid, known as perilymph (essentially the same in composition as the extracellular fluid bathing most of the nervous system), while the scala media is filled with endolymph (with very high potassium and low sodium concentrations).

Figure 4
Figure 4(a): Picture by Mireille Lavigne-Rabillard, from ‘Promenade around the cochlea’, by R. Pujol, S. Blatrix, T.Pujol and V. Reclar-Enjalbert, CRIC, University Montpelli ©
Figure 4 (a): Picture by Mireille Lavigne-Rabillard, from ‘Promenade around the cochlea’, by R. Pujol, S. Blatrix, T.Pujol and V. Reclar-Enjalbert, CRIC, University Montpelli
Figure 4 (a) coiled cochlea of a human foetus of five months gestation (magnification × 40); (b) diagrammatic representation of the three scalae of the cochlea (uncoiled)

Figure 5 is a cross-section of the cochlea showing the three chambers which run along its length. Between the scala vestibuli and the scala media is a membrane called Reissner's membrane and between the scala tympani and the scala media is the basilar membrane. Lying on top of the basilar membrane within the cochlear duct is the organ of Corti and hanging over the organ of Corti, is the tectorial membrane. The collective term for the partitions of the scala media (the organ of Corti, the basilar membrane and the tectorial membrane) is the cochlear partition.

Figure 5
Figure 5 Cross-section of the cochlea
Figure 6
Figure 6 Construction of a ‘cochlea’

The following simple activity will help you understand the structure of the cochlea more clearly. Imagine that your empty coffee mug (assuming it's a cylindrical shape, or thereabouts) represents a section of the uncoiled tube-like cochlea. Take a piece of paper that is about the same height as the mug and that is wide enough to wrap halfway around the mug (Figure 6a). Fold it in half and insert it into the middle of the mug so that the fold runs vertically down the side of the mug where the handle is located (Figure 6b). The paper should lie across the middle of the mug. Now (if necessary) separate the two pieces of paper so that they form a V-shape (Figure 6c). You now have the basic structure of the cochlea. If you hold the mug by the handle in your right hand and look directly into it, you have the same view as that shown in Figure 5. Compare Figure 5 to your ‘cochlea’ and answer the following questions:

Activity

What structure does the top piece of paper represent?

Answer

Reissner's membrane.

Activity

What does the space above the top piece of paper represent?

Answer

The scala vestibuli.

Activity

What does the bottom piece of paper represent?

Answer

The basilar membrane.

Activity

What does the space below the bottom piece of paper represent?

Answer

The scala tympani.

Activity

What does the V-shaped space represent?

Answer

The scala media (or cochlear duct).

The scala media houses the organ of Corti. The organ of Corti and all its associated structures (including the hair cells, see below) runs the length of the basilar membrane (from the top to the bottom of your mug) as does the overlying tectorial membrane.

An enlargement of the organ of Corti is shown in Figure 7. The organ of Corti is the primary auditory receptor structure and houses the sensory receptor cells which are known as hair cells because each has about 100 hair-like stereocilia extending from its apical end which are embedded in the tectorial membrane. You can see that the hair cells are of two types: outer hair cells and inner hair cells, which are separated by a rigid inverted V-shaped structure known as Corti's arch.

We shall return to the organ of Corti and the hair cells and their involvement in the transduction of an auditory signal into neural information in Section 3.4. For now, we need to consider the basilar membrane and in particular, its response to vibrations in the cochlear fluid.

Figure 7(b): Copyright © Science Photo Library ©
Figure 7(b): Copyright © Science Photo Library
Figure 7: The organ of Corti: (a) sketch cross-section. (b) Coloured scanning electron micrograph. Four rows of hair cells can be seen, which are supported by pillar cells (magnification ×600).

3.3 The role of the basilar membrane in sound reception

So far we know that sound-induced increases and decreases in air pressure move the tympanum inwards and outwards. The movement of the tympanum displaces the malleus which is fixed to its inner surface. The motion of the malleus and hence the incus results in the stapes functioning like a piston – alternately pushing into the oval window and then retracting from it. Since the oval window communicates with the scala vestibuli, the action of the stapes pushes and pulls cyclically on the fluid in the scala vestibuli. When the stapes pushes in on the oval window, the liquid in the scala vestibuli is displaced. If the membranes inside the cochlea were rigid, then the increase in fluid pressure at the oval window would displace the fluid up the scala vestibuli, through the helicotrema and down the scala tympani causing the round window to bulge out. This is actually a fairly accurate description of what happens except that the membranes inside the cochlea are not rigid. As a consequence, the increase in pressure in the cochlear fluid caused by the inward movement of the stapes also displaces fluid in the direction of the cochlear partition, which is deflected downwards. This downward deflection in turn causes the elastic basilar membrane to move down and also increases the pressure within the scala tympani. The enhanced pressure in the scala tympani displaces a fluid mass that contributes to outward bowing of the round window. When the stapes pulls back, the process is reversed and the basilar membrane moves up and the round window bows inwards. In other words, each cycle of a sound stimulus evokes a complete cycle of up-and-down movement of the basilar membrane and provides the first step in converting the vibration of the fluid within the cochlea into a neural code. The mechanical properties of the basilar membrane are the key to the cochlea's operation.

One critical feature of the basilar membrane is that it is not uniform. Instead, its mechanical properties vary continuously along its length in two ways. First, the membrane is wider at its apex compared to the base by a factor of about 5, and second, it decreases in stiffness from base to apex, the base being 100 times stiffer.

Figure 8
Figure 8 Schematic representation of the basilar membrane (cochlea uncoiled) showing the variation in width along its length

So, the base is narrow and stiff compared to the apex (Figure 8). This means that stimulation by a pure tone results in a complex movement of the membrane. If it were uniform, then the fluctuating pressure difference between the scala vestibuli and the scala tympani caused by the sound would move the entire membrane up and down with similar excursions at all points. However, because of the variation in width and stiffness along its length, various parts of the membrane do not oscillate in phase. Over a complete cycle of sound each segment of the membrane undergoes a single cycle of vibration but at any point in time some parts of the membrane are moving upwards and some parts are moving downwards. The overall pattern of movement of the membrane is described as a travelling wave.

Figure 9
Figure 9 Instantaneous pattern of a travelling wave along the basilar membrane. (a) The pattern that would result if the membrane were ribbon-like. (b) The vibration of the membrane represented more realistically

To visualise the motion of a travelling wave, think of a wave that travels along a piece of ribbon if you hold one end in your hand and give it a flick. Figure 9a is a representation of what you might expect by flicking a ribbon. Figure 9b represents a more realistic representation of the wave on the basilar membrane because the basilar membrane is attached at its edges and is displaced in response to sound in a transverse (crosswise) direction as well as a longitudinal direction.

Activity

What do you notice about the change in amplitude of the wave as it travels along the membrane?

Answer

As it travels, the wave reaches a peak amplitude that then rapidly falls. The amplitude of the wave is therefore greatest at a particular location on the membrane.

Figure 10
Figure 10 The envelope formed by a 200 Hz tone. The shape of the envelope is described by the set of momentary locations (four shown here) traced by the travelling wave along the basilar membrane

A travelling wave then, is a unique moving waveform whose point of maximal displacement traces out a specific set of locations. The shape described by the set of these locations along the basilar membrane is called the envelope of the travelling wave (Figure 10). The point along the basilar membrane where the wave, and hence the envelope traced by the travelling wave, reaches a peak differs for each frequency. In other words, each point along the basilar membrane that is set in motion vibrates at the same frequency as the sound impinging on the ear, but different frequency sounds cause a peak in the wave at different positions on the basilar membrane (Figure 11a).

Figure 11
Figure 11 (a) A highly schematic map of frequency representation on the basilar membrane showing that the part of the basilar membrane that responds to sound depends on the frequency of the sound. (b) A schematic representation of the cochlea and the envelope of a travelling wave that would occur for stimuli of three different frequencies. One instantaneous waveform is shown for each frequency. (c) Displacement of the basilar membrane in response to a signal composed of two sinusoidal waves of 300 Hz and 2000 Hz.

Look at Figure 11b.

Activity

What do you notice about the point of maximum displacement for each frequency?

Answer

For the lowest frequency (60 Hz) the maximum displacement is near the apical end, for the highest frequency (2000 Hz) the maximum displacement is near the base, while the intermediate frequency has maximal displacement between the two.

Therefore, high-frequency sounds cause a small region of the basilar membrane near the stapes to move, while low frequencies cause almost the entire membrane to move. However, the peak displacement of the membrane is located near the apex. This shows that the travelling wave always travels from base to apex, and how far towards the apex it travels depends on the frequency of stimulation; lower frequencies travel further.

Activity

What would the response of the membrane be if the sound impinging on the ear was a complex sound consisting of frequencies of 300 Hz and 2000 Hz?

Answer

Each frequency would create a maximum displacement at a different point along the basilar membrane (as shown in Figure 11c).

The separation of a complex signal into two different points of maximal displacement along the membrane, corresponding to the sinusoidal waves of which the complex signal is composed, means that the basilar membrane is performing a type of spectral (Fourier) analysis. (Fourier analysis is the process of decomposing a waveform into its sinusoidal components.) The basilar membrane displacement therefore provides useful information about the frequency of the sound impinging on the ear by acting like a series of band-pass filters. Each section of the membrane passes, and therefore responds to, all sinusoidal waves with frequencies between two particular values. It does not respond to frequencies that are present in the sound but fall outside the range of frequencies of that section.

The filter characteristics of the basilar membrane can be studied using the technique of laser interferometry. Figure 12 shows the results of such a study. The data were collected by presenting different frequency sounds to the inner ear of a chinchilla and then measuring the level of each tone that is required to displace the basilar membrane by a fixed amount. Measurements are taken at a particular point on the basilar membrane.

Figure 12
Figure 12 The sound level required to maintain the basilar membrane at a constant displacement (1.9 × 10−8 m) as a function of the frequency of the tonal input

Activity

From Figure 12, determine the frequency of the tone that required the lowest sound level to displace the basilar membrane by a set amount.

Answer

A little under 10 000 Hz (in fact 8350 Hz or 8.35 kHz).

This frequency is known as the characteristic, critical or central frequency (CF) of that part of the membrane because it is most sensitive to (or tuned to) frequencies in the region of 8 kHz.

For frequencies above and below 8.35 kHz the tone had to be more intense in order to vibrate the membrane to the same extent as that caused by the 8.35 kHz tone. This particular point on the membrane therefore acts as a filter in that it responds maximally to tones of 8.35 kHz, but shows very little response to tones that are higher or lower than this.

In the next section we shall see how the band-pass filtering characteristics of the basilar membrane are preserved in the discharge pattern of nerve fibres that leave the cochlea.

The motion of the basilar membrane also provides information about the temporal pattern of acoustic stimulation: it takes longer for a low-frequency stimulus to reach its point of maximum displacement on the membrane than it does a high-frequency stimulus.

Activity

Why is this?

Answer

Because high-frequency stimuli cause maximal displacement of the membrane near the base of the cochlea (near the stapes), whereas low frequencies cause maximal displacement at the apical end. If the sound always travels from base to apex, it takes longer for the wave to travel to reach the apex.

Finally, the mechanics of the basilar membrane provide information regarding the level of acoustic stimulation. The greater the stimulus level, the greater the amount of basilar membrane displacement. Therefore, more intense signals cause greater membrane displacement at a particular point than less intense stimuli.

You should now read The mechanics of hearing by Jonathan Ashmore, attached below. There may be some terms and concepts that will not be familiar to you. Do not worry too much at this stage. There is some overlap in the material covered in this course and some of the concepts mentioned in the reading will be more comprehensively covered in later sections of the course.

Click View Document  to open The mechanics of hearing by Jonathan Ashmore

Summary

The ear is made up of the outer, middle and inner ears. The outer ear consists of the pinna, the external auditory canal and the tympanic membrane. The middle ear is air-filled and contains the middle ear ossicles. The inner ear is fluid-filled and contains the cochlea, the semicircular canals and the vestibule.

Sound in the external environment is channelled into the auditory meatus by the pinna and impinges on the tympanic membrane causing it to vibrate. These vibrations are transmitted to the inner ear via the middle ear ossicles. The ossicles act as an impedance-matching device and amplify vibrations between the outer and inner ear. They also function in preventing damage to the inner ear by very loud sounds via the middle ear reflex.

The inner ear contains the cochlea which has three compartments: the scala tympani, the scala vestibuli and the scala media (cochlear duct). Inside the cochlear duct is the organ of Corti. The organ of Corti contains the sensory receptors that are called hair cells, sits on top of the basilar membrane and is covered by the tectorial membrane.

The stapes connects to the scala vestibuli via the oval window. Movement of the stapes in response to sound causes the fluid in the scala vestibuli to vibrate. This causes the basilar membrane to move. The motion is described as a travelling wave. The base of the membrane is 5 times narrower and about 100 times stiffer than the apex.

The basilar membrane has a frequency-to-place conversion for pure-tone stimuli. High-frequency sounds cause greatest vibration near the base of the membrane, and low frequencies cause greatest vibration near the apex.

The basilar membrane acts like a band-pass filter. Each point on the membrane corresponds to a band-pass filter with a different centre frequency. This means that sounds of different frequency result in maximal displacement at different points along the membrane.

3.4 The organ of Corti and hair cells

We have established that the vibration patterns of the basilar membrane carry information about frequency, amplitude and time. The next step is to examine how this information is converted or coded into neural signals in the auditory nervous system. To do so, we must look at the organ of Corti in some detail since it is here that the auditory receptor cells that convert mechanical energy into a change in membrane polarisation are located.

As we saw in Section 2, the receptor cells responsible for the transduction of mechanical energy into neural energy are called hair cells. A typical hair cell is shown in Figure 13. The outer hair cells are closest to the outside of the cochlea and are arranged in 3 rows whereas the inner hair cells form a single row (see Figure 7). There are about 12 000 outer hair cells and about 3500 inner hair cells in the human ear. The tips of the tallest row of cilia of each hair cell are in contact with the tectorial membrane which is situated at the top of the organ of Corti and has a soft, ribbon-like structure.

Figure 13
Figure 13 The structure of a hair cell. Inside the cell body can be seen the nucleus, a synaptic bar, and synaptic vesicles which carry the neurotransmitter molecules

3.5 Neural transduction

The critical event for the transduction of sound into a neural signal is the bending of the stereocilia of the hair cells. In this section we will examine how the flexing of the basilar membrane leads to the bending of the stereocilia and the production of a neural signal.

3.5.1 Hair cells transform mechanical energy into neural signals

The tectorial membrane runs parallel to the basilar membrane, so when the basilar membrane vibrates up and down in response to motion at the stapes, so does the tectorial membrane. However, as shown in Figure 14, the displacement of the membranes causes them to pivot about different hinging points and this creates a shearing force between the hair cell stereocilia embedded in the tectorial membrane and the hair cells themselves which rest on the basilar membrane. Shearing is a particular form of bending in which, in this case, the top moves more than the bottom. It is this shearing force that transduces mechanical energy into electrical energy which is transmitted to the auditory nerve fibres.

Activity

What kind of sensory receptor transduces mechanical energy into electrical energy?

Answer

Mechanoreceptors.

Figure 14
Figure 14 Schematic diagrams of shearing forces created between the hair cells and the tectorial membrane as a result of basilar membrane displacement. (a) Shearing force that results from displacement of the basilar membrane towards the scala vestibuli when the basilar membrane is driven upwards. (b) Relationship between hair cells and tectorial membrane with no stimulation. (c) Shearing forces in the direction opposite to that shown in (a) after displacement in the opposite direction

In order for the hair cell to transduce stereocilia shearing (mechanical) forces into an electrical (neural) response, the permeability of the hair cell membrane must change. This happens when the shearing motion, which is a mechanical stimulus, opens ion channels in the cell's plasma membrane and the current flowing through these channels alters the cell's membrane potential (this is the electrical response). So, in response to a mechanical stimulus, there is an influx of ions into the cell which disturbs the resting potential of the cell membrane, driving the membrane potential to a new level called the receptor potential. The channels are relatively non-selective about which ions they allow to pass through them. However, you should recall from Section 3.2 and from The mechanics of hearing by Jonathan Ashmore, that potassium is very plentiful in the endolymph. The stereocilia of the hair cells are bathed in endolymph whereas the basal region of the cell is bathed in perilymph (which is relatively low in potassium). So once the channels are opened, potassium ions flow into the hair cell.

Activity

How does this differ from most other cells?

Answer

In most cells, the flow of potassium ions is outwards because the cell is higher in potassium than the surrounding medium. In the case of hair cells however, the endolymph surrounding the stereocilia has a higher level of potassium than the cell, and the flow is in the reverse direction.

In fact, when a hair bundle is displaced by a mechanical stimulus, its response depends on the direction and magnitude of the stimulus. In an unstimulated cell about 10 per cent of the ion channels are open. As a result, the cell's resting potential (about −50 mV) is determined, in part, by the inward flow of current. A positive stimulus that displaces the stereocilia towards the tall edge opens additional channels and the resultant influx of positive ions depolarises the cell by as much as tens of mV. A negative stimulus that displaces the stereocilia towards the short edge shuts the channels that are open at rest and hyperpolarises the cell (Figure 15).

This directional sensitivity of the cells, their arrangement on the organ of Corti and the hypothesised motion of the organ of Corti in response to a stimulus, means that an upward movement of the basilar membrane leads to depolarisation of the cells, whereas a downward deflection elicits hyperpolarisation.

The receptor potential of a hair cell is graded; as the stimulus amplitude increases, the receptor potential grows increasingly larger, up to a maximal point of saturation. The relationship between a bundle's deflection and the resulting electrical response is S-shaped (Figure 15d). This results in a high degree of sensitivity. A small displacement of only 100 nm (100 × 10−9 m) represents 90% of the response range of the hair cell (shaded part). Deflection of a hair cell by the width of a hydrogen atom is enough to make the cell respond.

Figure 15
Figure 15 Sensitivity of hair cells. (a) A schematic drawing of a hair cell with an electrode inserted into its cytoplasm. (b) Application of a mechanical force to the hair bundle causes a deflection in the stereocilia. (c) When the top of a hair bundle is displaced back and forth by a stimulus probe, the opening and closing of mechanically-sensitive channels produces an oscillatory receptor potential. (d) The sigmoidal relationship between hair bundle deflection (horizontal axis) and receptor potential (vertical axis) in a stimulated hair cell. The shaded area shows that a small displacement of only 100 nm represents 90 per cent of the response range of the hair cell

3.5.2 Mechanical force directly opens and closes transduction channels

It is believed that tip links aid in causing ‘channels’ to open and close near the top of the hair cell (Figure 16). Tip links are filamentous connections between two stereocilia. Each tip link is a fine fibre obliquely joining the distal end of one stereocilium to the side of the longest adjacent process. It is thought that each link is attached at one end or both to the molecular gates of one or a few channels. Under this arrangement, pushing a bundle in one direction increases the tension on the tip link and promotes channel opening while pushing the bundle in the opposite direction slackens the link and the associated channel closes.

Figure 16
Figure 16 Mechanism of mechano-electrical transduction. When the hair bundle is at rest each transduction channel oscillates between closed and open states, spending most of the time closed. Displacement of the bundle in the positive direction increases tension on the tip link and promotes channel opening, the influx of cations and a depolarizing receptor potential

3.5.3 Mechano-electrical transduction is rapid

Many other sensory receptors, such as photoreceptors and olfactory neurons, employ second messengers in the transduction process. This is not true for hair cells. The rapidity with which they respond makes this impossible. In order to deal with the frequencies of biologically relevant stimuli, transduction must be rapid. The highest frequency humans can hear is about 20 000 Hz. This in effect means that hair cells must be able to turn current on and off 20 000 times per second (200 000 times per second for a bat). Also, localisation of sound sources (Section 12) requires that animals are able to resolve very small time differences, in the order of 10 μs.

3.6 Synaptic transmission from hair cells

In addition to being sensory receptors, hair cells are also presynaptic terminals. The membrane at the base of each hair cell contains several presynaptic active zones, where chemical neurotransmitter is released. When the hair cells are depolarised, chemical transmitter is released from the hair cells to the cells of the auditory nerve fibres. Excited by this chemical transmitter, the afferent nerve fibres contacting the hair cells fire a pattern of action potentials that encode features of the stimulus. We will return to how this information is encoded in Section 6. As in other synapses, the depolarisation that leads to transmitter release acts through an intermediary, namely calcium ions. Depolarisation opens channels at the base of the hair cell (voltage-gated calcium channels), which allow calcium ions to enter from the surrounding perilymph resulting in the release of transmitter (Figure 17). Calcium also has another function: it opens potassium channels, called calcium-activated potassium channels, which allow potassium ions to leave the cells because the perilymph on the other side is low in potassium. The potassium ions leaving the hair cell via the calcium-activated channels results in repolarisation of the cell. The identity of the neurotransmitter is controversial. Glutamate appears to be the transmitter in some cases but there is also evidence for another, as yet unidentified, substance.

Figure 17
Figure 17 Depolarisation of a hair cell. Entry of potassium ions depolarises the hair cell which opens voltage-gated calcium channels. Incoming calcium ions further depolarise the cell leading to the release of chemical transmitter to the afferent nerve fibre contacting the hair cell

3.7 Hair cell tuning

We have determined that the location of the peak of the travelling wave on the basilar membrane is determined by the frequency of the originating sound. The hair cells run the length of the basilar membrane. When a certain frequency sound stimulates a point on the membrane, it responds by moving, and the hair cells at that site are stimulated by the shearing force that this movement creates (Figure 18). Groups of hair cells therefore only respond if certain frequencies are present in the originating sound. The frequency sensitivity of a hair cell can be displayed as a tuning curve. To construct a tuning curve, a single hair cell is stimulated repeatedly with pure tone stimuli of various frequencies. For each frequency, the intensity of the stimulus is adjusted until the response of the hair cell reaches some predefined level. The tuning curve is then the graph of sound intensity against stimulus frequency (Figure 19). Tuning curves for hair cells are characteristically V-shaped. The tip represents the frequency to which the cell is most sensitive.

Figure 18
Figure 18 Schematic diagram of the basilar membrane and hair cell tuning. A 4 kHz sound results in a peak in the travelling wave at position B. The hair cell at this position is stimulated by the bending of the stereocilia. The depolarisation results in transmitter release and the generation of an action potential in the auditory nerve fibre
Figure 19
Figure 19 Tuning curves of hair cells located at different positions on the basilar membrane

A sound of this frequency will elicit a response from the cell even when it is of very low intensity. Sounds of greater or lesser frequency require higher intensity to excite the cell to the predetermined level.

Adjacent piano strings are tuned to frequencies some 6 per cent apart. On average, successive inner hair cells differ in characteristic frequency by about 0.2 per cent.

The great majority of neurons that carry information from the cochlea to higher levels of the auditory system connect to the inner hair cells. Thus most, if not all, information about sounds is conveyed to the brain via the inner hair cells. Given that the outer hair cells greatly outnumber the inner hair cells, it seems paradoxical that most cochlear output is derived from the inner cells. However, ongoing research suggests that outer hair cells do play an important role in the transduction process. Membranes of the outer cells contain a motor protein that changes the length of the outer hair cells in response to stimulation. This change in length effects a change in the mechanical coupling between the basilar and tectorial membranes. Outer hair cells are sometimes said to constitute a cochlear amplifier by amplifying the response of the basilar membrane. This causes the sterocilia on the inner cells to bend more, creating a bigger response in the auditory nerve (Figure 20).

Figure 20
Figure 20 Amplification by outer hair cells. (a) Motor proteins in the membranes of the outer hair cells are expanded when the cells are in a resting state. (b) When potassium enters the cell, motor proteins are activated and contract the hair cell. (c) Conformational change in the hair cell increases the bending of the basilar membrane. (d) If the cochlear amplifier is deactivated (for example, with drugs) bending of the basilar membrane is decreased dramatically

3.7.1 Summary of Sections 3.4 to 3.7

Hair cells are found in the organ of Corti and run the length of the basilar membrane. They transform mechanical energy into neural signals.

When the basilar membrane vibrates in response to sound, hair cells located at the site of maximal vibration on the basilar membrane are stimulated. This means that the mechanical properties of the membrane allow the auditory system to distinguish one frequency from another by the location on the membrane that is maximally excited by a particular frequency. Hair cells located at the place of maximum excitation respond, allowing the auditory system to extract information about the frequency of a sound.

Auditory transduction occurs when the basilar membrane moves up and down and the cilia of the inner hair cells rub against the tectorial membrane. The bending of the cilia produces an electrical response in the hair cells. Most of the transduction current is carried by potassium ions, potassium being the cation with the highest concentration in the endolymph bathing the hair bundle.

Displacement of the stereocilia towards the tall edge results in an influx of cations and a depolarisation of the hair cell. Displacement of stereocilia towards the short edge results in hyperpolarisation of the hair cell. Depolarisation of the hair cell allows calcium ions to enter the cell leading to the release of transmitter from the presynaptic terminals on the hair cell.

Outer hair cells help amplify vibrations of the basilar membrane.

3.8 Revision questions

Question 1

Discuss the two ways in which the middle ear increases the effectiveness with which sound is transmitted from the external ear to the inner ear.

Answer

The first way in which the middle ear enhances the efficiency of sound transfer is to do with the relative sizes of the tympanic membrane and the stapes footplate (which is connected to the oval window). Measurements have shown that the area of the tympanic membrane that vibrates in response to high intensity sound is 55 mm2.

The stapes footplate which makes contact with the oval window has an area of about 3.2 mm2, which is considerably less than the effective area of the tympanic membrane. So, if all the force exerted on the tympanic membrane is transferred to the stapes footplate, then the force per unit area must be greater at the footplate because it is smaller than the tympanic membrane. Put simply, if the same force is applied to a large area and to a small area, the force applied to the smaller area will result in a bigger pressure change. You know that if you hit a wall with a hammer you make a small dent but if you hit a nail with a hammer swung with the same force, all the force is concentrated on a small point and the nail is driven into the wall. In fact the tympanic membrane and the footplate differ in size by a factor of 17 (55 mm2/3.2mm2=17) so pressure at the footplate (force per unit area) is 17 times greater than at the tympanic membrane and therefore the air pressure can stimulate the fluid filled inner ear.

The second way in which the middle ear enhances the efficiency of sound transfer is through the lever action of the ossicles. Figure 2 shows how a lever system can increase the force of an incoming signal. In the figure, the lever is pivoting around a fulcrum at point C. The distance D1 between the fulcrum and the point of the applied force is larger than the distance D2, between the fulcrum and the position of the resultant force. The increase in force due to lever action is given by the formula:

The second way in which the middle ear enhances the efficiency of sound transfer is through the lever action of the ossicles. Figure 2 shows how a lever system can increase the force of an incoming signal. In the figure, the lever is pivoting around a fulcrum at point C. The distance D1 between the fulcrum and the point of the applied force is larger than the distance D2, between the fulcrum and the position of the resultant force. The increase in force due to lever action is given by the formula:

Fresultant = Fapplied × (D1/ D2)

Therefore the closer the fulcrum is to the point of the resultant force, the larger this force will be. The ossicles of the middle ear are arranged so that they act like a lever. The length of the malleus corresponds to D1 (the distance between the applied force and the fulcrum), while the incus acts as the lever portion between the fulcrum and the resultant signal (D2). Measurements of the length of these two bones indicate that the lever system of the ossicles increases the force at the tympanic membrane by a factor of 1.2 at the stapes. In addition, the tympanic membrane tends to buckle as it moves causing the malleus to move with about twice the force. So, overall the increase in pressure at the stapes footplate is in the region of 17 × 1.2 × 2 = 40.8. The reduction in sound level caused by the fluid/air interface is estimated to be about 30 dB. Therefore the middle ear counteracts this reduction.

Question 2

Draw a flow-diagram to illustrate the route a sound (pressure) wave takes from the time it enters the external ear to the point at which it reaches the round window.

Answer

pinna → external auditory canal → tympanic membrane → malleus → incus → stapes → oval window → scala vestibuli → helicotrema → scala tympani → round window

Question 3

Describe the basic structure of the cochlea and discuss how the different structures contribute to the reception of sound.

Answer

The cochlea has a spiral shape resembling the shell of a snail. Unravelled, the cochlea's hollow tube is about 32 mm long and 2 mm in diameter. The tube of the cochlea is divided into three chambers: the scala vestibuli, the scala media (or cochlear duct) and the scala tympani. The three scalae wrap around inside the cochlea like a spiral staircase. The scala vestibuli forms the upper chamber and at the base of this chamber is the oval window. The lowermost of the three chambers is the scala tympani. It too has a basal aperture, the round window, which is closed by an elastic membrane. The scala media or cochlear duct separates the other two chambers along most of their length. The start of the cochlea, where the oval and round windows are located is known as the basal end, while the other end, the inner tip, is known as the apical end. The scala vestibuli and the scala tympani communicate with one another via the helicotrema, an opening in the cochlear duct at the apex. Both scala vestibuli and scala tympani are filled with the same fluid known as perilymph while the scala media is filled with endolymph.

Between the scala vestibuli and the scala media is a membrane called Reissner's membrane and between the scala tympani and the scala media is the basilar membrane. Lying on top of the basilar membrane within the cochlear duct is the organ of Corti, and hanging over the organ of Corti is the tectorial membrane. In response to sound entering the cochlea, the fluid within the cochlea vibrates. The key factor in the response of the inner ear is the mechanical response of the basilar membrane and organ of Corti. These two structures translate the mechanical vibrations of the inner ear fluids into neural responses in the auditory nerve. The vibration of the fluids causes the basilar membrane to move which creates a shearing motion between the basilar membrane and the overlying tectorial membrane. This in turn causes the cilia of the hair cells contained within the organ of Corti to bend. The bending of the cilia results in the nerve fibre at the base of the hair cell initiating a neural potential that is sent along the auditory nerve in the form of action potentials. Thus the hair cells in conjuction with the basilar membrane translate mechanical information into neural information.

Question 4

What is a travelling wave in the context of the response of the basilar membrane to an incoming sound signal?

Answer

It is the motion set up on the basilar membrane in response to movement of the cochlea fluids. The wave propagates from the base of the membrane towards the apex. The point of maximal displacement of the wave is determined by the frequency of the incoming sound.

Question 5

What are the different properties of the fluids found in the main compartments of the cochlea? How do they contribute to the transduction of a neural signal?

Answer

Endolymph is found in the scala media and perilymph is found in the scala vestibuli and scala tympani. Endolymph has an ionic concentration similar to that of intracellular fluid, high K+ and low Na+ (even though it is extracellular). Perilymph has an ionic content similar to that of cerebrospinal fluid, low K+ and high Na+. Because of the ionic concentration differences, the endolymph has an electrical potential that is about 80 mV more positive than the perilymph. The stereocilia of the hair cells are bathed in endolymph while the base of the hair cells and their afferent dendrites are bathed in perilymph. When the stereocilia are bent by movement of the basilar membrane, they either depolarise or hyperpolarise, depending on the direction in which they are bent. The receptor potential, which is either above or below the resting potential of the hair cell, results from opening or closing potassium channels in the tips of the stereocilia. When the cell depolarises, K+ channels open and more K+ enters the cell. When the cell hyperpolarises, K+ channels, which are normally partially open, close and inward movement of K+ is prevented.

4 Neural processing of auditory information

In this section we will look at how the frequency selectivity found along the basilar membrane is preserved or modified by the auditory nerve and how information about the intensity of the signal is encoded in the response of the auditory nerve fibres.

The nerve that communicates with or innervates the hair cells along the basilar membrane is called the vestibulocochlear nerve or VIIIth cranial nerve. It enters the brainstem just under the cerebellum and conveys information from the hair cells in the inner ear as well as from the vestibular organs of the inner ear. The cochlear portion of the nerve (auditory nerve ) contains two basic types of auditory nerve fibres: afferent fibres that carry information from the peripheral sense organ (organ of Corti) to the brain; and efferent fibres that bring information from the cerebral cortex to the periphery. Afferent fibres arise from nerve cell bodies in the spiral (or cochlear) ganglion (Figure 21) and contact the hair cells. The hair cells themselves do not have axons and therefore do not generate action potentials. Action potentials are first produced by the axons of afferent fibres. Recall that about 10 per cent of the ion channels are open when the hair cell is unstimulated (Section 3.5). This means that in the auditory nerve, there is a continuous low level of discharge of action potentials even when hair cells are unstimulated. Depolarisation of hair cells in response to stereocilia shearing causes an increase in the discharge rate of action potentials above this spontaneous rate (excitation) while hyperpolarisation of hair cells leads to a decrease in the discharge rate of action potentials below the spontaneous discharge rate (inhibition) (Figure 22).

Figure 21
Figure 21 Innervation of the organ of Corti. Afferent fibres arise from nerve cell bodies within the spiral ganglion. Ninety-five per cent of afferents contact inner hair cells, each of which consists of the sole terminus for up to ten axons. Five per cent of afferents contact the outer hair cells
Figure 22
Figure 22 In unstimulated hair cells there is a low level of discharge of action potentials in the axons of the auditory nerve fibres. When the cell is stimulated, depolarisation results in an increase in the discharge rate of action potentials (excitation) while hyperpolarisation results in a decrease in discharge rate of action potentials (inhibition)

The inner hair cells are innervated by 95 per cent of the afferent fibres. In humans an average of 8 fibres innervate 1 inner hair cell. They therefore make a many-to-one connection with inner hair cells (Figure 21). The other 5 per cent of afferents innervate the outer hair cells.

5 Frequency coding in cochlear nerve fibres

5.1 Place code

We know that each hair cell occurs in a localised region of the cochlea, and that auditory nerve fibres contacting each hair cell fire action potentials in response to movement of the basilar membrane at that location. This means that the response of any given fibre should reflect the frequency selectivity of that location on the basilar membrane from which it comes. In other words, cochlear nerve fibres preserve the frequency selectivity found along the basilar membrane. Fibres on the outside of the auditory nerve bundle (those that innervate the basal hair cells) have high characteristic frequencies whereas those towards the middle of the nerve bundle (those that innervate the apex of the cochlea) have low characteristic frequencies. Thus, each place or location within the nerve responds ‘best’ to a particular frequency. The nerve fibres are spatially arranged to correspond directly to their basilar membrane origin. This arrangement is known as tonotopic organisation, which can be defined as the orderly spatial arrangement of neural elements corresponding to the separation of different frequencies. Functionally, tonotopic organisation allows the input frequency to be determined according to which nerve fibre discharges with the greatest relative discharge rate. This way of determining frequency is known as the place theory and gives rise to the place code. Tonotopic organisation is found at all higher levels of the auditory system up to and including the auditory cortex.

There are several ways in which the frequency selectivity of single fibres can be determined. One way is to present a single fibre with a wide range of stimuli of different frequencies but identical intensity. The function generated when responses to the stimuli are plotted against the frequency of each stimulus is called an iso-intensity contour. Figure 23 a shows a number of such contours for a single fibre in the auditory nerve. Each curve is generated using a different intensity level of the stimulus.

Activity

What do you notice about the different contours?

Answer

The higher the intensity of the stimulus, the broader the contour.

Activity

What does this indicate in terms of the frequency selectivity of the fibre?

Answer

It means that at lower intensities, the fibre responds maximally to a narrow range of frequencies but as the intensity of the signal increases, the range of frequencies to which the fibre responds increases, i.e. the fibre shows lower frequency selectivity at high intensities.

A second way of displaying the tuning characteristics of single auditory fibres is to generate a tuning curve. This is done the same way as tuning curves for hair cells are generated (Section 3.7). Figure 23b is a tuning curve for a hair cell from a guinea pig showing how the threshold intensity for a given fibre varies with stimulus frequency.

Activity

What is the characteristic frequency (CF) of this particular fibre?

Answer

About 18 kHz. The intensity of the stimulus needed to elicit a response is lower than for any other frequency (less than 40 dB SPL).

You can see that the high-frequency side of the curve is very steep whereas the low-frequency side is less steep and may have a long, low-frequency tail.

Activity

What does this indicate?

Answer

It means that nerve fibres are unlikely to respond to many frequencies higher than their characteristic frequency (CF) even when the intensity of the signal is very high. (At a frequency of about 23 kHz, the intensity of the signal needed to be about 70 dB to elicit a response.) For frequencies below CF the intensity of the signal needed to elicit a response is not quite so high and there is a broader range of frequencies to which the fibre will respond.

Figure 23(a)
Figure 23 (a) Iso-intensity contours for a single fibre in the auditory nerve. For each contour generated, the intensity of the stimulus is kept constant while the frequency is varied. The number of spikes (action potentials) generated by the nerve fibre is then recorded for each frequency. The more spikes generated, the more sensitive or tuned the fibre is to that particular frequency. (b) Tuning curve for an auditory nerve fibre from a guinea pig

5.2 Frequency code

Although the evidence for the place theory of frequency coding is compelling, there is some question as to whether the tuning curves obtained from neurons in the auditory nerve provide a mechanism for frequency discrimination that is fine enough to account for behavioural data. People can detect remarkably small differences in frequency – in some cases as small as 3 Hz (for a 1000 Hz signal at moderate intensity). What accounts for this ability? As early as 1930, the American experimental psychologists Glen Wever and Charles Bray proposed that in response to a pure tone, the vibration of the basilar membrane matches the input frequency. They further suggested that the auditory receptors respond in such a way that the temporal pattern of basilar membrane vibration is reproduced in the firing pattern of the neuron. This could be achieved if auditory nerve fibres respond by firing one or more action potentials at the same time in every cycle of a pure tone. This is known as a phase-locked response, since the response appears locked to a certain point (e.g. the peak) in the stimulus (Figure 24a). By phase locking, the response pattern of the nerve fibre would accurately reflect the frequency of the sound wave. This is called the frequency code.

Figure 24
Figure 24 Phase locking. (a) The neuron is phase locked to the same point in every cycle of the pure tone stimulus. (b) The volley principle. The ensemble of fibre responses shown at the bottom of the figure has a pattern of firing that corresponds to the frequency of the input sound wave

This idea is attractive and there is evidence that it occurs, but only at low frequencies. The reason for this is that neurons cannot fire much faster than about 1000 action potentials per second (they have an absolute refractory period of about 1 ms and cannot fire twice in succession at intervals of less than 1 ms). Therefore they cannot fire during each cycle of the stimulus for stimuli above 1000 Hz (1 kHz). This realisation led Wever and Bray to propose the operation of a volley principle illustrated in Figure 24b. In this figure, the frequency of the sound wave illustrated on top is too high for a single fibre to fire on every cycle. According to the volley principle each fibre only fires at a certain point in the cycle although it does not respond to each cycle. Each of the eight fibres illustrated is firing in phase; that is, if on any cycle a given auditory nerve fibre does fire, it does so at the same relative position within the cycle. If the responses of all fibres are then combined, as may happen further up in the auditory system, then information regarding signal frequency is preserved. The bottom trace in the figure shows the combined responses of all eight fibres; while none of the individual fibres reproduces the pattern of the wave, the combined response is sufficient to reproduce the frequency of the incoming signal. Using this principle, fibres can phase lock to signals with frequencies up to 5 kHz thereby enhancing the transmission of information about stimulus frequency. Above this level, the variability inherent in neural firing becomes too great for such fine patterns to be resolved, and the frequency is probably coded for solely by the place code.

The central nervous system therefore gains information about stimulus frequency in two ways. First there is the place code: the fibres are arranged in a tonotopic map such that position is related to characteristic frequency. Second, there is the frequency code: fibres fire at a rate reflecting the frequency of the stimulus. Below 50 Hz, it appears that frequency is encoded solely by the frequency code. Frequency coding is also of particular importance when the sound is loud enough to saturate the neural firing rate (Section 6). Fibres of many characteristic frequencies will respond to a loud signal because it will be above threshold even for fibres with characteristic frequencies that are different from the signal frequency (although they will respond less vigorously). However, frequency information will still be encoded in the temporal firing pattern of all stimulated fibres.

6 Intensity coding

6.1 Firing-rate hypothesis

Information about stimulus intensity is encoded in two ways: the firing rates of neurons and the number of active neurons.

Intensity is assumed to be encoded by an increase in discharge rate of action potentials within the auditory system. As the stimulus gets more intense, the basilar membrane vibrates at a greater amplitude causing the membrane potential of activated hair cells to be more depolarised and this causes the nerve fibres that synapse onto the hair cells to fire at a greater rate. However for single fibres, the discharge rate increases only for a relatively small range of level changes. Figure 25 shows the results of recordings from an auditory nerve fibre in response to a stimulus of increasing intensity.

Activity

What does the figure show with respect to the responses of the fibre as a function of the sound level of the stimulus?

Answer

The threshold of the fibre is about 25 dB SPL and the cell responds maximally to all sounds greater than about 65 dB SPL.

Figure 25
Figure 25 Intensity response function for a single auditory nerve fibre. The stimulus is a pure tone at the characteristic frequency of the neuron. As the intensity of the stimulus increases, so does the number of action potentials (spikes) generated per second

For this fibre a change in response with intensity only occurs over a range of about 40 dB, after which it no longer responds to increases in sound level with an increased firing rate: the fibre is said to be saturated. The range of sound levels between threshold and the level at which saturation occurs is called the dynamic range.

Humans are sensitive to a 140 dB dynamic range. Since a single fibre's discharge rate will only increase for a relatively small range of level changes (usually less than 35 dB), single fibre responses alone cannot encode changes in signal intensity. However, if intensity is determined by an increase in discharge rate of a large number of fibres with different response thresholds, then a large dynamic range could be accommodated. The most sensitive nerve fibres have response thresholds of about 0 dB SPL and characteristically have high rates of spontaneous activity. They produce saturating responses for stimulation at moderate intensities, about 40 dB SPL. At the opposite extreme, some afferent fibres display less spontaneous activity and much higher thresholds and give graded responses to intensities of stimulation in excess of 100 dB SPL. Activity patterns of most fibres are between these two extremes. Thus combining information from low-, medium- and high-threshold fibres may serve as the code for sound level.

6.2 Number of neurons hypothesis

In addition to an increase in firing rate of neurons with differing dynamic ranges, the inclusion of discharges from many fibres whose CFs are different from those of the stimulus may also help to account for the wide dynamic range of the ear. You know from Section 3.3 that in response to a pure tone stimulus the basilar membrane vibrates maximally at a given point. You should also be aware, however, that a pure tone will also cause vibration at points on the membrane adjacent to that of maximum stimulation. These vibrations are then reflected in the responses of the hair cells. Thus a pattern of excitation is produced in the auditory nerve such that fibres with a CF close to the input frequency will fire more strongly than those fibres whose CF is very different from the signal. Figure 26 shows a pattern of neural excitation along the basilar membrane that may be produced by a pure tone of 80 dB SPL (solid line). Assume that the neurons most excited by this stimulus are firing at a maximum rate so that any increase in intensity of the stimulus causes no increase in the firing rate of these cells. However, the cells with a CF either higher or lower than the stimulus frequency are not firing at their maximum level, so increasing the stimulus level causes an increase in firing rate of these cells. The effect of this is to broaden the excitation pattern as shown by the dashed curve. Thus, intensity could be encoded by how broad the excitation pattern may be to a given stimulus.

Figure 26
Figure 26 The proposed mechanism for the coding of intensity. An increase in stimulus intensity results in an increased firing rate of neurons with characteristic frequencies below or above the stimulus frequency, but no change in the firing rate of the neurons most sensitive to the stimulus, since these are already saturated

Now read The transformation of sound stimuli into electrical signals by Robert Fettiplace attached below. This chapter reinforces some of the material you have studied so far on the transduction of sound stimuli and frequency coding.

Click View Document  to open The transformations of sound stimuli into electrical signals by Robert Fettiplace

6.3 Summary of Sections 4 to 6

Hair cells do not have axons and therefore do not generate action potentials.

The nerve that communicates with or innervates the hair cells along the basilar membrane is known as the vestibulocochlear nerve or VIIIth cranial nerve. The cochlear portion of the nerve contains afferent fibres that carry information in the form of action potentials from the organ of Corti to the brain, and efferent fibres that bring information from the cerebral cortex to the periphery.

Most of the afferent fibres connect to inner hair cells with which they make a many-to-one connection. The nerve cells that innervate the hair cells at the apex of the cochlea are in the middle of the nerve bundle while fibres from the base of the cochlea make up the outside fibres of the nerve bundle. The frequency-to-place conversion seen in the cochlea is therefore preserved in the auditory nerve.

There are two main theories concerning the way in which the auditory system encodes the frequency of the signal: the frequency code and the place code. Evidence suggests that the frequency code operates for frequencies below 50 Hz whereas the place code operates at frequencies above 1000 Hz. Both appear to play a role for frequencies between these.

There are also two main theories regarding how the auditory system encodes intensity information: the firing rate of neurons and the number of neurons that fire.

7 The central auditory nervous system

7.1 The ascending auditory pathway

Up till now we have dealt with the anatomy of the auditory periphery and how the basic attributes of sound are coded within the auditory periphery. A great deal of additional processing takes place in the neural centres that lie in the auditory brainstem and cerebral cortex. Because localisation and other binaural perceptions depend on the interaction of information arriving at the two ears, we need to study the central auditory centres, since auditory nerves from the two cochleae interact only at the brainstem and cerebral cortex. This section deals with the structure and function of the central auditory nervous system (CANS).

Within the brainstem almost all fibres of the auditory nerve synapse on cells of the cochlear nucleus. The relationship between the cochlear nucleus and higher auditory centres is shown in Figure 27. This figure is highly schematic and simplified and shows only the main tracts and nuclei of the CANS, although other nuclei exist.

Figure 27
Figure 27 Highly schematic diagram of the bilateral central auditory pathway. The main pathways and nuclei are shown for both cochleae. Binaural stimulation occurs at the superior olive and all regions above

Once they leave the cochlear nucleus, most of the axons of the cochlear nucleus cells cross over to the opposite side (contralateral side) of the brain (Figure 27). This means that most of the auditory information processed by each half of the brain comes from the ear on the other side of the head. This is in contrast to that found in the visual system, where ganglion cell fibres either cross or stay on the same side of the brain in equal proportions. Both crossed and uncrossed fibres from the cochlear nuclei synapse in the area of the brainstem called the superior olivary complex. This is the first place in the ascending pathway to receive information from both ears. Neural impulses are transmitted from the superior olivary complex to the inferior colliculus through and/or around the lateral lemniscus (some fibres synapse in the lateral lemniscus but most travel through it to the inferior colliculus), from there to the medial geniculate body and finally to the auditory cortex. The location of the auditory cortex on the surface of the brain is shown in Figure 28.

Figure 28
Figure 28 The primary auditory cortex in humans

7.2 Coding of information in the higher auditory centres

We have seen that in the cochlear nerve, information about sound intensity is coded for in two ways: the firing rates of neurons and the number of neurons active. These two mechanisms of coding signal intensity are found throughout the auditory pathway and are believed to be the neural correlates of perceived loudness. The tonotopic organisation of the auditory nerve is also preserved throughout the auditory pathway; there are tonotopic maps within each of the auditory nerve relay nuclei, the medial geniculate nucleus (MGN, labelled meidal geniculate body in Figure 27) and the auditory cortex. Conversion from frequency to position that originates on the basilar membrane is maintained all the way up to the auditory cortex. One source of information about sound frequency is therefore derived from tonotopic maps; the location of active neurons in the auditory nuclei and in the cortex is an indication of the frequency of a sound. Phase locking as a means of frequency coding is also present in centres further along the pathway.

There are, in fact, two distinct pathways that occur in the CANS:

  1. The ‘what’ pathway which is monaural and receives information from only one ear. This pathway is concerned with the spectral (frequency) and temporal (time) features of a sound and is hardly concerned with the spatial aspects. It focuses mainly on identifying and classifying different types of sound.

  2. The ‘where’ pathway which is binaural and receives information from both ears. It is involved in the localisation of a sound stimulus.

Despite the apparent dichotomy of these two processing pathways, the same types of acoustic cues may be important for the analysis that occurs in each. For example, spectral information is used in the ‘where’ pathway for determining a sound's elevation; and temporal information, used for our perception of frequency in the ‘what’ pathway, is also used in the ‘where’ pathway for determining a sound's horizontal location.

7.2.1 The ‘what’ pathway

The main nucleus involved in the ‘what’ pathway is the cochlear nucleus which has three main components, each of which is tonotopically organised; cells with progressively higher characteristic frequencies are arrayed in an orderly progression along one axis (Figure 29). The cochlear nuclei contain neurons of several types, each of which encodes a specific parameter of a stimulus (frequency, intensity, time): stellate cells encode stimulus frequency and intensity, bushy cells provide information about the timing of acoustical stimuli, and are involved in locating sound sources along the horizontal axis, and fusiform cells are thought to participate in the localization of sound sources along a vertical axis.

Figure 29
Figure 29 The representation of stimulus frequency in the cochlear nucleus. Stimulation with two sounds of different frequency causes vibration of the basilar membrane at two different positions (top). This in turn excites two distinct populations of afferent fibres, which project onto the cochlear nucleus in an orderly fashion

7.2.2 The ‘where’ pathway

The ‘where’ pathway involves the ventral cochlear nuclei, the superior olivary complex and the inferior colliculus. The superior olivary complex is composed of the lateral superior olive (LSO) and the medial superior olive (MSO).

The neurons in the superior olivary complex are the first brainstem neurons to receive strong inputs from both cochleae and are involved in sound localisation.

The MSO receives excitatory inputs from the cochlear nuclei on both sides and is tonotopically organised. It is involved in the localisation of sound in the horizontal plane by processing information about auditory delays. Units in the MSO increase their firing rate in response to sounds from both ears as opposed to one ear, and these excitatory–excitatory (EE) units will increase their discharge rate further in response to sounds that reach both ears with a certain delay. In other words, a unit will discharge at the greatest rate when there is a particular interaural delay. This aids in localising sound in the horizontal plane.

The LSO is also involved with sound localisation but instead of using interaural time delays, it employs intensity differences to calculate where a sound originated. Information from the ipsilateral (same side) inputs to the LSO is usually excitatory and results in an increase in discharge rate of the neuron. Contralateral stimulation of the LSO is usually inhibitory. Thus, stimulation from both ears may decrease the firing rate of the neuron relative to the firing rate when only the ipsilateral ear receives sound. These excitatory–inhibitory (EI) units discharge with a few spikes when there is approximately equal stimulation of both ears and discharge rate increases as a function of changing the interaural level difference. The LSO therefore appears to form a network for processing interaural level differences, which are used to determine the location of sound sources.

The use of interaural time and intensity differences in sound localisation will be dealt with in more detail in Section 12.

The inferior colliculus is part of the tectum and is the most prominent nucleus in the brainstem. It receives inputs from the olivary complex and the cochlear nucleus. Units in the inferior colliculus appear to be mainly EI units although there are EE units as well. They are tonotopically organised in sheets of cells (as in the cochlear nucleus). Cells in different parts of the inferior colliculus are either monaural, in that they respond to input from one ear only, or binaural, responding to bilateral stimulation. Both the spectral processing that takes place in the cochlear nucleus and the binaural processing that occurs in the olivary complex are seen in the inferior colliculus. In fact, the inferior colliculus is the termination of nearly all projections from brainstem auditory nuclei. It is therefore a ‘watershed’ for information processing where the ‘what’ and ‘where’ pathways converge on a single tonotopic map. Outputs of the inferior colliculus project mainly to the medial geniculate nucleus.

The medial geniculate nucleus is also tonotopically organised. Neurons with the same characteristic frequency are arrayed in one layer, so that the nucleus consists of a stack of neural laminae that represent successive stimulus frequencies. Sensitivity to interaural time or intensity differences is maintained. Axons leaving the MGN project to the auditory cortex. The neural responses of cortical cells in response to sound have been studied extensively in primates. In general, neurons are relatively sharply tuned for sound frequency and possess characteristic frequencies covering the audible spectrum of frequencies. In electrode penetrations made perpendicular to the cortical surface, the cells encountered tend to have similar characteristic frequencies, suggesting columnar organisation on the basis of frequency, the so-called ice-cube model of the auditory cortex (Figure 30). Although most of the neurons in the primary auditory cortex are sensitive to stimulation through either ear, their sensitivities are not identical. Instead the cortex is divided into alternating strips of two types. Half of these strips contain EE neurons and respond more to stimulation from both ears than to either ear separately, and the other half consist of EI neurons which are stimulated by unilateral input but inhibited by stimulation from the opposite ear. The strips of EE and EI cells run at right angles to the axis of tonotopic mapping so that the primary auditory cortex is partitioned into columns responsive to every audible frequency and to each type of interaural interaction.

Figure 30
Figure 30 Ice-cube model of the auditory cortex

7.3 The descending auditory pathway

The auditory system transmits information from the cochlea to the auditory cortex. Another system follows a similar path, but in reverse, from the cortex to the cochlear nuclei. This is the descending auditory pathway. In general, the descending pathway may be regarded as exercising an inhibitory function by means of a sort of negative feedback. It may also determine which ascending impulses are to be blocked and which are allowed to pass to other centres in the brain. The olivocochlear bundle, which arises from the olivary complex, is involved in sharpening or otherwise modifying the analysis that is made in the cochlea.

7.4 Summary of Section 7

Fibres of the cochlear nerve synapse on the cells of the cochlear nuclear complex which is the first station of the central auditory pathway. From here signals are sent to the superior olivary complex, the inferior colliculus, lateral lemniscus, medial geniculate nucleus and finally the auditory cortex. The central role of the auditory cortex is the processing of complex sounds.

Each cochlear nuclear complex receives input from only one ear. In the cochlear nuclear complex are several different neural types that are responsible for extracting information about the spectral and temporal features of incoming sound.

Neurons in the superior olivary complex (SOC) are the first to receive input from both ears and are thought to play an important role in sound localization. The SOC processes information about interaural delays and intensities.

The inferior colliculus (IC) is a site for convergence of information. IC cells are organized in layers called sheets and within each sheet there appears to be a segregation of the EE and EI inputs. More complex aspects of a sound signal are processed in the IC and further features are extracted.

A tonotopic representation of frequency is seen at all levels of the auditory pathway.

7.5 More revision questions

Question 6

Describe how phase locking transmits information about the frequency of a sound (include the volley principle).

Answer

Phase locking is the consistent firing of an auditory neuron at the same phase of each cycle of a sound wave. At low frequencies, the neuron will fire action potentials at some constant location on the wave (peak or trough, for example) so that the frequency of the sound can be determined from the frequency of the neuron's action potentials. For higher-frequency sounds, neurons may not fire on every cycle even though they fire at the same point on the cycle. A group of such neurons can encode the frequency of the sound wave if their activity is pooled.

Question 7

The place theory of frequency coding is regarded as being of minimal use for very low frequencies. Why is this?

Answer

Recall that low frequencies create a rather broad or flat pattern of vibratory activity on the basilar membrane – nearly the entire membrane moves although the peak in the wave is towards the base. High frequencies on the other hand create a wave that is very localised – only a small part of the membrane moves. This means that for low frequencies, the displacement pattern on the membrane is much less specific and localisable than the peak displacement at high frequencies.

Question 8

Describe how the movement of the basilar membrane provides the brain with information about a signal's (a) frequency and (b) intensity.

Answer

(a) The travelling wave on the basilar membrane has a peak amplitude at a location determined by the frequency of the incoming sound wave. For low-frequency sounds the peak is located at the apical end of the membrane while for high-frequency sounds the peak is located at the basal end. Hair cells are located along the basilar membrane. Cells that are located at the position where the wave peaks are stimulated, resulting in the generation of action potentials in the auditory nerve fibres contacting those hair cells. Fibres on the outside of the auditory nerve innervate the basal hair cells and therefore fire in response to high-frequency sounds whereas fibres on the inside of the auditory nerve innervate apical hair cells and therefore fire in response to low-frequency sounds. So when a sound of a certain frequency stimulates the basilar membrane, the brain receives information about the frequency of the sound, as a consequence of which the fibres in the auditory nerve fire action potentials at the highest relative rate.

(b) The higher the intensity of the sound impinging on the ear, the greater the amplitude of the wave produced on the basilar membrane. The higher amplitude wave causes nerve fibres to fire at a greater rate (the number of action potentials per second is higher for a high-amplitude wave compared to a low-amplitude wave) or causes more neurons to fire (auditory fibres have different thresholds and the greater the displacement of the membrane the greater the number of neurons that reach threshold).

Question 9

Use diagrams similar to Figure 24 to illustrate how the firing pattern of auditory neurons connected to two different (but close together) positions on the basilar membrane could encode information about the frequency and intensity of the following signals: (a) a low-intensity low-frequency tone; (b) a low-intensity high-frequency (but < 1 kHz) tone; (c) a high-intensity low-frequency tone; and (d) a high-intensity, high-frequency (but < 1 kHz) tone.

Answer

Frequency is determined by the place code (which neurons are firing) and by a temporal code (neurons fire in bursts that phase lock to the stimulus frequency). Intensity is determined by the firing rate (more spikes per burst for louder sounds) and the number of neurons (at high intensity, more spikes are produced from both positions). (see Figure 31).

Figure 31
Figure 31 Answer to Question 9

Question 10

Describe the method used to determine the characteristic frequency of a single auditory nerve fibre?

Answer

To determine the characteristic frequency of an auditory nerve fibre you would construct a tuning curve. Tuning curves indicate the sound pressure level at the eardrum that is just sufficient to elicit a detectable increase in the firing rate of an auditory nerve fibre, as a function of the frequency of a pure tone stimulus. For each fibre the lowest intensity of a pure tone that will produce a detectable response across a range of pure tones is determined. The frequency of the tone for which the threshold of a given fibre is lowest is called the critical or characteristic frequency.

Question 11

Where in the higher auditory centres does binaural processing of information begin? What is the nature of the information used that enables us to localize sounds? How does the operation of excitatory and inhibitory inputs enable the auditory system to use this information?

Answer

Binaural processing of information begins in the superior olivary complex, using interaural time delays and interaural intensity differences. In the MSO, neurons increase their firing rate in response to sounds from both ears, and will increase their discharge rate even further when sounds reach both ears with a certain delay (interaural time delays, EE units). In the LSO, neurons increase their firing rate in response to sounds in the ipsilateral ear and are inhibited from firing by sounds in the contralateral ear. So stimulation from both ears may decrease the firing rate of neurons compared to stimulation by one ear (EI units).

Activity 1 The Senses and Hearing animation

This is a good time to break from reading and look at two interactive activities. The first involves exploring the structure and function of the ear, the middle and inner ear, the auditory pathways to the brain and the auditory cortex. There is also an explanation of sound in this activity. The second activity is to look at an animation of the transmission of sound from the outer to the inner ear.

8 Auditory perception

8.1 Introduction

We have learned so far that physical energy from the environment is transduced into electrochemical messages that affect the nervous system and give rise to psychological experiences, that is, produce sensations and perceptions. Sensation refers to the initial process of detecting and encoding environmental energy. The first step in sensing the world is performed by receptor cells, which in the case of hearing are the hair cells in the cochlea. Perception on the other hand, generally refers to the result of psychological processes in which meaning, relationships, context, judgement, past experience and memory all play a role. In many meaningful environmental encounters however, it is difficult to make such a clear distinction between sensation and perception. For example, when we hear a tune are we aware of any isolated tonal qualities of the notes, such as pitch and loudness, distinct from the melody? In most instances, perception and sensation are unified, inseparable psychological processes. In the next section we will now look at an essential tool that has been used to study the quantitative relationship between environmental stimulation (the physical dimension) and sensory experience (the psychological dimension). We will then go on to examine the sensory or psychological effects produced by simple sounds and finally, look briefly at the reception of sound as meaningful information that allows us to perceive spatial features such as localisation.

9 Psychophysics

9.1 Introduction

Psychophysics is the oldest field of the science of psychology. It stems from attempts in the nineteenth century to measure and quantify sensation. It attempts to quantify the relationship between a stimulus and the sensation it evokes, usually for the purpose of understanding the process of perception. Historically, psychophysics has centred around three general approaches. The first involves measuring the smallest value of some stimulus that a listener can detect – a measure of sensitivity known as a threshold. The second is discrimination, where the subject is presented with two or more stimuli (e.g. two tones of different frequency) and then asked whether the stimuli are different. The third approach involves directly asking the listener about the stimulus. These are usually called scaling procedures.

The stimuli used in psychophysical tests can be varied along a number of dimensions. For example if the stimulus is light, it could vary in wavelength, size or shape; if it is sound it could vary in intensity, frequency or duration, etc. The response to the stimulus may be a verbal report (‘yes I see it’, ‘no I don't see it’, ‘these two appear the same’) or a mechanical response, such as pressing a button.

To estimate absolute thresholds and discrimination abilities, two basic classical psychophysical methods are used. The first of these is the method of limits and the second, the method of constant stimuli.

9.2 Absolute thresholds

The absolute threshold or absolute limen is the smallest value of a stimulus that an observer can detect. The concept of an absolute threshold assumes there is a precise point on the intensity or energy dimension that, when reached, becomes just perceptible to the observer and he or she responds ‘yes – I can detect the stimulus’. It follows that when the stimulus is one unit weaker it will not be detected. If this were the case then some form of hypothetical curve, like the one shown in Figure 32 would be the result. However this rarely happens, as illustrated in the following example where an auditory threshold is derived using a traditional psychophysical method, the method of limits.

Figure 32
Figure 32 A hypothetical curve linking stimulus intensity to absolute threshold. The vertical axis plots the number of trials in which the subject responds ‘yes’, he/she can detect the stimulus. The threshold value is 14.0 dB SPL, i.e. below 14.0 dB SPL the stimulus is not detected

9.2.1 Method of limits

To determine an auditory threshold using the method of limits, one would begin with an undetectable stimulus and then gradually increase the intensity until the subject detects it. Results from a hypothetical method of limits study are shown in Table 1. Stimulus intensity is shown in the first column and the subject's response to each stimulus is listed under ‘trial 1’. Only when the stimulus was 14 dB SPL, did the subject respond ‘yes’ (I can hear the stimulus). The threshold for hearing the stimulus therefore lies somewhere between 13.5 and 14.0 dB SPL. Column 3 shows a repeat of the experiment. You can see that the responses of the subject in trial 2 were not the same as in trial 1. In this case the subject failed to detect the stimulus at a level of 14 dB SPL. As the experiment is repeated (trial 3 and trial 4) the responses differ from trial to trial. Loud stimuli, at intensities of 15 dB SPL and above are always heard whereas very soft stimuli (13 dB SPL and below) are never heard. Between these extremes, responses vary and are heard only a certain percentage of the time: stimuli at a level of 14.5 dB SPL were heard in 3 out of 4 trials (75%), whereas a stimulus level of 13.5 dB SPL was heard in only 1 out of 4 trials (25%). The reason why responses may differ is because actual thresholds change from trial to trial or because there is a variable amount of extraneous ‘noise’. We shall return to this problem later. The percentage detection for each stimulus is shown in the last column. If we plot the percentage of stimuli detected against stimulus intensity we get a graph similar to that shown in Figure 33 – a smooth S-shaped curve known as a psychometric function. It is usual to define the threshold stimulus as that stimulus intensity corresponding to a 50% detection on the psychometric function.

Activity

In this case, what is threshold for our subject?

Answer

According to the graph, a 50% detection corresponds to a signal intensity of 14.0 dB SPL.

Table 1 Results from a hypothetical method of limits study.“yes” responses (%)
Stimulus/dB SPL trial 1 trial 2 trial 3 trial 4 % detection
12.0 N N N N 0
12.5 N N N N 0
13.0 N N N N 0
13.5 N N Y N 25
14.0 Y N Y N 50
14.5 Y Y Y N 75
15.0 Y Y Y Y 100
15.5 Y Y Y Y 100
16.0 Y Y Y Y 100
16.5 Y Y Y Y 100
17.0 Y Y Y Y 100
Figure 33
Figure 33 A typical threshold function. By convention, absolute threshold is defined as the intensity at which the stimulus is detected 50% of the time

Although very useful, the method of limits is open to various sources of bias and error. One of its drawbacks is that the change in stimulus intensity (increase or decrease) is orderly and regular.

Activity

How may this affect the observer?

Answer

At any point the subject knows how intense a stimulus to expect next. As the series of presentations progresses, the expected intensity changes and the subject knows that the next stimulus will be more (or less) intense than the previous one. This could bias him or her to report a ‘yes’ when in fact the stimulus cannot be heard.

9.2.2 Method of constant stimuli

This method is similar to that described above but has two advantages over the method of limits. The first is that it's designed to overcome bias inherent in presenting stimuli in a set order. This is done by randomising the order of presentation of stimuli. The subject therefore has no way of anticipating the intensity of the next stimulus (it could be softer or louder than the preceding one). In the table, the stimuli would be presented in a random order: for example, 13 dB SPL, 17 dB SPL, 12.5 dB SPL, etc. until each of the intensities is presented a sufficient number of times. Once again a psychometric function is generated and the intensity of the stimulus value detected on 50 per cent of the trials is used as the measure of absolute threshold.

The method of constant stimuli has a second advantage over the method of limits in that the experimenter can get an estimate of the listener's bias by including ‘catch’ or ‘blank’ trials in the sequence. That is, occasionally the experimenter presents no stimulus to the listener. The listener does not know this has happened, and so the response is still ‘yes’ or ‘no’. If there is a bias towards either response, then the proportion of ‘yes’ responses in the blank trials will reflect this bias.

9.3 Differential sensitivity

Absolute thresholds represent only one type of threshold; one could also ask whether the subject can detect a difference between two stimuli. The threshold for detection of difference is called a difference threshold or difference limen (DL). The difference threshold is a measure of the smallest detectable difference between two stimuli. Basically it answers the psychophysical question: ‘How different must two stimuli (e.g. two weights, two colours, two sounds) be from each other in order to detect them as different stimuli?’

The difference threshold, like the absolute threshold described in the beginning of this section, is a derived statistical measure; it is the difference in magnitude between two stimuli, usually a standard (S) and a comparison stimulus (T), that is detected 50 per cent of the time. For example, if two tones of the same intensity are presented to a listener, the listener will generally report that they are equal in loudness. However, as the intensity of one of the tones is gradually increased, an intensity difference between the tones will be reached at which they will be judged different in 50 per cent of trials. The magnitude of this difference specifies the difference threshold; that is, the amount of change in a stimulus necessary to produce a just noticeable difference (JND) in sensation. If the magnitude of a stimulus, say a sound, is 100 dB SPL, and the sound has to be increased to 110 dB SPL in order to be perceived as different, then the JND equals 10 dB.

The measurement of a JND can be made using any of the classical methods discussed above. A psychometric function similar to that shown in Figure 33 is generated except that the horizontal axis is ΔI instead of I. ΔI is the increment in intensity that, when added to the stimulus intensity I, produces a JND or the smallest detectable increment. From the psychometric function, a threshold value, the difference threshold, can be deduced. This difference threshold is the ΔI appropriate to the standard (S) used. For a different S, a different psychometric function is generated, and a different ΔI derived. By measuring the ΔI for a large number of standards you can develop a function that describes how the JND changes for different levels of stimulation.

9.3.1 Weber's Law

Pioneering work on the relationship between ΔI and S was done by the German physiologist, Ernst Weber in the 1830s. Weber found that the increment in stimulation required for a JND was proportional to the size of the stimulus. Weber had subjects lift a small ‘standard’ weight (S) and then lift a slightly heavier ‘comparison’ (T) weight and judge which was heavier. He found that when the difference between the standard and comparison weights was small, the subjects found it difficult to detect a difference between the weights, but could easily detect large differences – not really suprising. However, he also found that the size of the JND depended on the size of the standard weight. For example, the JND for a 100 g standard weight was 5 g (= ΔI). In other words, the subject could tell the difference between a 100 g standard and a 105 g comparison weight but couldn't tell the difference between a 100 g weight and a comparison weight less than 5 g heavier. In contrast, the JND for a 200 g weight was found to be 10 g (= ΔI); the subject could only detect a difference between a standard 200 g weight and a comparison weight of 210 g or more. Thus, as the magnitude of the stimulus increases, so does the size of the JND.

Research on a number of senses, including hearing, has shown that the JND is larger for larger standard stimuli and that, over a fairly large range of intensities, the ratio of JND to the standard stimulus is constant, i.e.

This is called Weber's law which states that the bigger the stimulus, the bigger the increment needed for a change to be detectable. K is called the Weber fraction. Applying the equation to our example of lifted weights, we find that for a 100 g standard, K = 5/100 = 0.05.

Activity

What does K equal for a 200 g weight?

Answer

K=10g/200g=0.05.

Therefore the Weber fraction is constant. What this means is that if I increases, ΔI must increase correspondingly. So, if the standard intensity is low, the increment of change necessary to produce a JND is correspondingly small; by contrast, if the initial intensity is high, the stimulus increment necessary for the JND is correspondingly large.

The Weber fraction holds for most senses, as long as the stimulus intensity is not too close to threshold.

10 The perception of intensity

10.1 Absolute thresholds

The human ear has incredible absolute sensitivity and dynamic range. The most intense sound we can hear without immediate damage to the ear is at least 140 dB above the faintest sound we can just detect. This corresponds to an intensity ratio of 100 000 000 000 000:1. In this section, we examine how the loudness of a sound can be measured and how the perception of loudness is affected by the intensity and duration of the signal.

You know from Section 9.2 that the absolute threshold is the smallest value of some stimulus that a listener can detect. In order to investigate our perceptual capabilities, it is useful to generate an absolute threshold curve, which relates the frequency of a signal to the intensity at which it can be detected by the ear. Figure 34 is a plot of the thresholds of hearing for a range of frequencies.

Figure 34
Figure 34 Human auditory thresholds as a function of frequency. Sounds that fall in the shaded region below the curve are below threshold and therefore inaudible

Activity

From the graph, describe the relationship between signal frequency and the threshold of audibility of a signal.

Answer

Auditory thresholds are lowest for tones around 3000 Hz and increase for tones of higher and lower frequency.

This means that people are therefore most sensitive to tones of frequencies around 3000 Hz, with sensitivities decreasing for tones that are either higher or lower in frequency. There will be very high and very low frequencies to which, no matter how intense the sinusoidal wave, the auditory system is insensitive. These frequency limits define the bounds of the auditory system's sensitivity to frequency.

In order to generate an audibility curve like that shown in Figure 34, you would determine the level required for a listener to detect the presence of a sinusoidal wave at each of many frequencies. One method of doing this involves delivering the sound using loudspeakers and measuring the sound pressure at the entrance to the auditory meatus at threshold. A threshold measured in this way is known as a minimum audible field (MAF). In contrast when sounds are delivered through earphones the threshold measured is called the minimum audible pressure (MAP). MAP thresholds are plotted as a function of frequency in Figure 34.

The threshold sound levels displayed in Figure 34 produce extremely small physical displacements at both the tympanic and basilar membranes. In humans, the sound level of frequencies to which we are most sensitive cause movements of the basilar membrane of about 0.2 nm – about the diameter of two hydrogen atoms.

Figure 34 shows auditory thresholds for young people. As we grow older, we become less sensitive to stimuli of all frequencies, but the maximum hearing losses occur for high frequency tones.

Activity

Can you think of a reason why this may be the case?

Answer

Remember that low frequencies stimulate the basal end of the basilar membrane, even though the peak in displacement is towards the apex. This part of the membrane in fact is stimulated by all frequencies to some extent and the hair cells in this region have a greater potential for being ‘worn out’.

10.2 The relationship between loudness and intensity

The loudness of an auditory stimulus is a psychological, not physical attribute of the stimulus. The physical attribute of sound that is most closely correlated with loudness is intensity. So loudness is the listener's subjective description of the intensity of the stimulus. As you know, we are not equally sensitive to sounds of all frequencies so perceived loudness of a tone in fact depends on frequency as well as intensity. Two sounds can have the same physical sound pressure levels but if they are of different frequencies, they are often perceived as having different loudness.

How do we measure loudness? To do so, we have to relate a subjective quality such as loudness to a physical quantity such as sound pressure level. One way of doing this is to generate a plot of equal loudness contours. A 1000 Hz tone is set to some specific intensity and then the sound levels of other tones of different frequency that are equal in loudness to the 1000 Hz tone, are determined. For example, a subject may be presented with a 1000 Hz standard tone at 60 dB SPL and then asked to manipulate the intensity of a 2000 Hz tone until it matches the loudness of the 1000 Hz tone. The same 1000 Hz tone would then be compared with a 3000 Hz tone and the intensity manipulated till it matched the 1000 Hz tone in loudness. In this way, the intensity of tones at a variety of frequencies could be obtained so that all tones matched the loudness of the 60 dB SPL, 1000 Hz tone. These intensities are then plotted as a function of frequency to generate equal loudness contours as shown in Figure 35. The term that is used to describe or measure the loudness of a signal is known as a phon. The loudness in phons is the level in dB SPL of an equally loud 1000 Hz tone. So all tones judged equal in loudness to a 40 dB SPL, 1000 Hz tone have a loudness of 40 phons. The tones presented at levels such that they are equal in loudness to a 70 dB SPL, 1000 Hz tone all have loudness levels of 70 phons, and so on. The equal loudness contours in Figure 35 form the phon scale. Look at the contour labelled 30 in Figure 35. Any sound whose frequency and intensity lie on the contour sounds just as loud as any other sound on the contour, although the frequency and intensity of the two sounds will differ. So for example, a 60 Hz tone at a 65 dB intensity level and a 330 Hz tone at a 40 dB intensity level both sound as loud as a 1000 Hz tone at a 30 dB intensity level and all have a loudness level of 30 phons.

Figure 35
Figure 35 Equal loudness contours. The bottom curve, 0 phons, shows the absolute sensitivity of the ear as a function of frequency. All tones lying on the same loudness contour sound equally loud, although their intensities (and frequencies) may differ, and are assigned the same value of phons

Activity

What do you notice about the contours as loudness level increases from 0 to 120phons?

Answer

The contours change in shape and become much flatter.

What this indicates is that at high intensity levels, the frequency of a sound becomes less important in the perception of loudness. Look at the 120 phon contour. In order to sound equally loud, a 60 Hz tone, a 300 Hz tone and a 1000 Hz tone need to differ by a maximum of 5 dB in intensity (125 dB SPL, 123 dB SPL and 120 dB SPL respectively). This contrasts with the 30 phon contour where, in order to sound equally loud, the three tones differ by as much as 35 dB. This in effect means that we are relatively more sensitive to low-frequency tones than to high-frequency tones at high loudness levels. Below about 70 phons, low-frequency tones require a higher intensity to achieve comparable loudness with higher-frequency tones. This is especially true for sounds with frequencies below about 1000 Hz.

Because of the relationship between intensity and loudness, complex sounds that are identical in frequency may sound different because of variations in loudness. You may have experienced this if you listen to voices heard from a loudspeaker at full volume. Because we are more sensitive to low-frequency sounds at high loudness levels, they will seem to have much greater low-frequency components, giving them a ‘boomy’ sound. Similarly, musical recordings made at high volume and then played softly often seem to be lacking in the bass range. This is because at low intensity levels we are relatively less sensitive to low-frequency tones and so the music sounds ‘tinny’. Many stereos compensate for this effect by having a ‘loudness’ switch that adds extra bass at low volume levels.

10.3 Intensity discrimination

The smallest detectable change in intensity has been measured using a variety of psychophysical methods and various stimuli. Although the difference threshold depends on several factors including duration, intensity and the kinds of stimuli on which the measurement is made, Weber's law holds for most stimuli. In other words, the smallest detectable change is a constant fraction of the intensity of the stimulus. Expressed in dB, the minimum change in intensity that produces a perceptual difference is about 0.5 to 1.0dB. However, for pure tones Weber's law does not hold in that discrimination, as measured by the Weber fraction, improves at high levels. For a 1000 Hz tone, the difference threshold ranges from 1.5 dB at 20 dB SPL to 0.3 dB at 80 dB SPL.

11 The perception of frequency

11.1 The relationship between frequency and pitch

Although the perception of sound involves the interaction of frequency and intensity, many aspects of frequency reception can be analysed separately.

For normal or typical hearing, the limits of hearing for frequency fall between 20 and 20 000 Hz. Below 20 Hz only a feeling of vibration is perceived; above 20 000 Hz, only a ‘tickling’ is experienced.

As well as loudness, the other most obvious characteristic of a sound is its pitch. Pitch is a subjective dimension of hearing. It is the sound quality most closely related to the frequency of a pure tone. High-frequency tones are perceived as being of high pitch while low-frequency tones are said to be of low pitch. The relationship between pitch and frequency is however, not a simple linear one. In order to investigate how the two are related, pitch has been assigned the arbitrary unit the mel. The pitch of a 1000 Hz tone at 40 dB SPL has been given a fixed value of 1000 mel. In order to determine the number of mels that are associated with different frequency tones, a subject is presented with a 1000 Hz tone and told that the pitch is 1000 mels. The subject is then asked to manipulate the frequency of a tone until that tone has a pitch that is one half as high as the 1000 mel tone. This tone is then assigned a value of 500 mel. The subject is then asked to find a frequency that is half the pitch of the 500 mel tone which is then assigned a value of 250 mel. In this way a function relating frequency to mels can be generated (Figure 36). Figure 36 shows that pitch is not related to frequency in either a linear fashion or a logarithmic fashion (note that frequency is plotted on a logarithmic scale); the relationship is more complex. In general, pitch increases more rapidly than frequency for tones below 1000 Hz and less rapidly for tones above 1000 Hz. That is, for frequencies above 1000 Hz a greater change in frequency is needed to produce a corresponding change in pitch.

Figure 36
Figure 36 Pitch in mels plotted against frequency (in Hz). The curve shows that the perceived pitch of a tone varies with frequency

11.2 Frequency discrimination

Some findings indicate that, for moderate loudness levels, humans can detect a frequency change of about 1 to 3 Hz for frequencies up to about 1000 Hz. Figure 37 shows a plot of the smallest frequency difference for which two tones can be discriminated for a number of reference tones. You can see from the figure that up to about 1000 Hz, the DL is between 1 and 3 Hz. In fact, for frequencies between 500 and 2000 Hz, discriminability is a constant fraction of the frequency to be discriminated. In other words, the Weber fraction (ΔF / F) for this frequency interval remains constant, at approximately 0.002. Although this holds true for a wide range of intensities the intensity of the sound does affect the determination of the minimal discriminable change in frequency. The DL for frequency increases as the stimulus intensity decreases. In other words, as the intensity of the sound decreases, it is more difficult to detect it as being different from other sounds close to it in frequency.

Figure 37
Figure 37 Difference limens for pitch as a function of frequency at a moderate loudness level

11.3 Frequency selectivity

In preceding sections we examined two ways in which the auditory system may code frequency information: the place theory and phase locking. In this section we will look at the psychophysical evidence for place coding on the basilar membrane by examining the ability of the auditory system to resolve the components of sinusoidal waves in a complex sound – a phenomenon known as frequency selectivity.

The perception of a sound depends not only on its own frequency and intensity but also on other sounds present at the same time. You will all be familiar with the experience of one sound ‘drowning out’ another sound. For example, typical classroom sounds, created by movement, coughing, rustling of papers, make the instructor's voice difficult to hear. This phenomenon is called masking. Technically speaking, masking is defined as the rise in threshold of one tone (test tone) due to the presence of another (masker) tone.

It has been known for many years, that a signal is most easily masked by a sound having frequency components close to those of the signal. This led to the idea that our ability to separate the components of a complex sound depends on the frequency-resolving power of the basilar membrane. It also led to the idea that masking reflects the limits of frequency selectivity and provides a way to quantify it.

11.3.1 A masking experiment

The procedure for a masking experiment is shown in Figure 38. First, the threshold for hearing is determined across a range of frequencies (Figure 38a). Then, a masking stimulus is presented at a particular place along the frequency scale and while the masking stimulus is sounding, the thresholds for all frequencies are re-determined (Figure 38b). The masking stimulus used can be a pure tone or, more commonly white noise. A white noise stimulus is simply one that contains a band of frequencies with equal sound pressure at each frequency. It sounds something like the ‘shhhhhhh’ sound you can make by blowing air across your teeth when they are slightly separated. The band of frequencies used can vary in width. For example, a band of frequencies 90 Hz wide centred on 410 Hz would contain frequencies ranging from 365 to 455 Hz.

Figure 38
Figure 38 The procedure for a masking experiment. (a) The threshold is determined across a range of frequencies. Each arrow indicates a frequency where the threshold is measured. (b) The threshold is re-determined at each frequency (small arrows) in the presence of a masking stimulus (large arrow)

When thresholds are measured in the presence of a masking noise the original thresholds are raised. Figure 39 shows the result of an experiment using a 90 Hz band of noise centred at 410 Hz.

There are two things to note about Figure 39. First, the threshold increases most for frequencies near the frequencies in the masking tone. Second, the curve is not symmetrical; the masking effect spreads more to high frequencies than to low frequencies. So, lower frequencies mask higher-frequency sounds much more effectively than the reverse.

Figure 39
Figure 39 Results of a masking experiment. The red line indicates the amount that the threshold is raised in the presence of a masking noise centred at 410 Hz. So for a 410 Hz tone, the threshold is raised by about 60 dB above absolute threshold

How is this masking effect explained?

Many masking effects can be explained in a very simplified way by analysing the interaction of displacement patterns in response to sound on the basilar membrane. Figure 40 shows the vibration patterns on the basilar membrane caused by a 400 Hz, 800 Hz and 1000 Hz tone. You can see that the vibration pattern of the 800 Hz tone overlaps those of the 400 and 1000 Hz tones. Note also that the pattern for the 800 Hz tone (which is shaded) almost totally overlaps the pattern for the higher frequency, 1000 Hz tone, but does not overlap the place of peak vibration of the lower frequency, 400 Hz tone. We would therefore expect the 800 Hz tone to mask the 1000 Hz tone more effectively than the 400 Hz tone. This is what happens, providing support for the place mechanism of frequency tuning on the basilar membrane.

Figure 40
Figure 40 Vibration patterns on the basilar membrane caused by 400, 800 and 1000 Hz tones
Figure 41
Figure 41 The procedure for measuring a psychophysical tuning curve. A10 dB test tone (black arrow) is presented and then a series of masking tones (red arrows) are presented at the same time as the test tone. The psychophysical tuning curve is generated by determining the SPL threshold of the masking tones needed to reduce the perception of the test tone to threshold

Masking has also been used to determine psychophysical tuning curves. A low intensity tone, called the test tone, is presented throughout the experiment. A series of masking tones (also pure tones) are then presented, one at a time, with the test tone. One of the masking tones is the same frequency as the test tone and the others are higher or lower in frequency. The level of each masking tone is reduced until the test tone is just audible. To understand the rationale behind this experiment, imagine that the horizontal line in Figure 41 is the basilar membrane. Each masking tone, because it is a pure tone (single frequency), will cause vibration mainly at one point on the basilar membrane. If the masking tone is the same, or of similar frequency to the test tone, it will mask the test tone more effectively than a masking tone with a frequency far away from that of the test tone. The results of this kind of masking experiment are shown in Figure 42a. They show what we would expect according to the place theory of frequency coding on the basilar membrane: when the masking tone is the same frequency as the test tone it doesn't need to be very intense in order to mask the test tone. However, when the masking tone is higher or lower in frequency than the test tone, higher intensities are required to mask the test tone. You can compare the psychophysical tuning curves in Figure 42a with those obtained from auditory nerve fibres of a cat seen in Figure 42b.

The close match between them suggests that both reflect the same process – a place code for frequency on the basilar membrane.

Figure 42(a)
Figure 42 (a) Three human psychophysical tuning curves generated using the method described in Figure 41. The arrows show the frequency of three different test tones. You can see from the figure that when the masking tone is the same as, or close to, the test tone in frequency, the intensity of the masker needed to mask the test tone is low. (b) Three neural tuning curves showing the stimulus intensity needed to generate a constant response (firing rate) in the nerve fibre of a cat. Each curve represents a different auditory nerve fibre

11.4 Signal duration

Since hearing is largely a matter of stimulus reception over time, we would expect time to influence the perception of sound. It has been known for many years that both absolute thresholds and the loudness of sounds depend upon signal duration. The studies of absolute threshold described earlier were all carried out with tone bursts of relatively long duration. For durations exceeding 500 ms, the sound intensity at threshold is roughly independent of duration. However for durations of less than 200 ms the sound intensity needed for detection increases as signal duration decreases. This also means that for sounds of less than 200 ms duration the intensity must be increased to maintain a constant level of loudness. Intensity also affects the perception of frequency. For example, if a tone of an audible frequency and intensity is presented for only a few milliseconds, it will lose its tonal character and will either be inaudible or be heard as a click. The length of time a given frequency must last in order to produce the perception of a stable and recognisable pitch is about 250 ms. We are also able to discriminate between tones of different frequencies when their duration is lengthened.

11.5 Summary of sections 8 to 11

In these sections we have described some of the quantitative relationships between the physical dimensions of simple sounds and their subjective psychological dimensions. The physical dimension of intensity, or pressure amplitude, given in decibels (dB), directly affects loudness. Frequency of pressure changes, in hertz (Hz), mainly determines pitch.

The lowest threshold value and hence the maximal sensitivity for humans is in the region of 3000 Hz.

The quantitative relationship between intensity and loudness is that loudness grows more slowly than intensity. Equal loudness contours indicate that humans are more sensitive to frequencies between 1000 and 4000 Hz than other frequencies within the hearing range. When intensity is held constant, sounds in the region of 3000 Hz appear louder than sounds of other frequencies. The minimum change in intensity of a sound that produces a perceptual difference is 1 to 2 dB.

A number of different mechanisms play a role in intensity discrimination. Intensity changes can be signalled by both changes in the firing rates of neurons at the centre of the excitation pattern, and by the spreading of the excitation pattern. In addition, cues related to phase locking may also play a role in intensity discrimination. This may be particularly important for complex stimuli, for which the relative levels of different components may be signalled by the degree of phase locking to components.

The relationship between frequency and pitch is investigated using the mel scale. This shows that pitch is not linearly related to frequency. Pitch increases more rapidly than frequency for tones below 1000 Hz and less rapidly for tones above 1000 Hz.

The difference threshold for frequencies up to 1000 Hz is about 3 Hz, whereas for frequencies between 1000 and 4000 Hz the Weber fraction remains constant at about 0.002. Intensity affects the difference threshold: the lower the intensity, the higher the difference threshold.

Masking experiments support the place theory of frequency selectivity on the basilar membrane.

Although the sound heard depends primarily on its frequency and intensity, both its pitch and its loudness are secondarily affected by the duration of the sound. Within limits, loudness and pitch recognition increases as the duration of a brief burst of sound is lengthened.

12 Sound localisation

12.1 Localisation of sound in the horizontal plane

While information about frequency and intensity is essential for interpreting sounds in our environment, sound localisation can be of critical importance for survival. For example, if you carelessly cross the street, your localisation of a car's horn may be all that saves you. Our current understanding of the mechanisms underlying sound localisation suggests that we use different techniques for locating sources in the horizontal plane and vertical plane.

Activity

Imagine a sound source that is directly in front of you. All else being equal, will the sound reach each ear at the same time?

Answer

Yes, if it is directly in front of you, since the distance it must travel to each ear is the same.

Activity

Will the sound be equal in loudness at your two ears given that as sound travels over distance, it decreases in intensity?

Answer

Yes, because it travels the same distance and therefore attenuates to the same extent before reaching each ear.

Activity

Now imagine a sound source that is directly to one side of your head – say the left side. Which ear will receive the sound first?

Answer

The left ear, because the distance the sound must travel to the left ear is shorter than to the right (it must travel over your head).

Activity

At which ear will the sound be louder?

Answer

At the left ear since it travels less distance and therefore attenuates less compared to the sound arriving at your right ear.

These two kinds of information, differences in intensity of sound at the two ears (interaural intensity differences) and differences in the time of arrival of sound at the two ears (interaural time delay) enable our auditory system to localise a sound source in the horizontal plane.

12.2 Interaural time delays: non-continuous sounds

The average distance between human ears is about 20 cm. Therefore, if a sudden noise comes at you from the right, perpendicular to your head, it will reach your right ear 0.6 ms before it reaches your left ear. For a sound coming from directly in front of you there will be no delay, and at angles between, the delay will be between 0 and 0.6 ms. Therefore there is a simple relationship between the location of the sound source and the interaural delay. It is this delay that enables us to localise the source of a sound in the horizontal plane. How does the auditory system encode information about interaural time delays? It is thought that the mechanism involves a series of delay lines and coincidence detectors, as illustrated in Figure 43.

We know from our discussion about the auditory pathway that the first place where information from both ears comes together is in the superior olivary nucleus. Briefly, when a sound arrives at one ear it is transduced by the hair cells, elicits firing in the auditory nerve and evokes spikes in the axons that project from the cochlear nuclei to the medial superior olive. The same sound will initiate a similar series of events when it reaches the opposite ear. In Figure 43a, the sound reaches the left ear and hence the left cochlear nucleus first resulting in the generation of action potentials that are relayed to cells in the superior olive. A fraction of a millisecond later, sound reaches the right ear, initiating activity in the right cochlear nucleus (Figure 43). By this time however, the impulses travelling from the left cochlear nucleus have travelled further along the axon (which is the delay line). Impulses from both ears reach coincidence at, in this case, olivary neuron 3, which then fires an action potential (Figure 43c).

Figure 43
Figure 43 Delay lines and coincidence detection

Activity

According to this illustration, if the sound were directly in front of the person, which of the olivary neurons would fire?

Answer

Olivary neuron 2, because the time it would take the action potential to travel from the left and the right cochlear nuclei would be the same and coincidence would occur in the midline.

You can see from this example that the auditory system can extract information about the location of a sound source by attending to which neuron in the superior olivary nucleus fires in response to the sound. This is because neurons will only fire if there is some specific delay between the spikes arriving from the left and right cochlear nuclei, and different neurons fire in response to different delays.

12.3 Interaural time delays: continuous tones

Coincidence detectors and delay lines cannot be used to localise a continuous tone.

Activity

Why?

Answer

Because, a continuous tone is always present at both ears and if we don't hear the onset of a sound then our auditory system cannot determine the initial difference in arrival times at the two ears.

So, in order to localizs a continuous tone the auditory system uses another kind of temporal information: the time at which the same phase of the sound wave reaches the ear.

Recall that neurons are capable of phase locking to a sound stimulus: they fire at characteristic points or phase angles along the sound wave. A neuron tuned to one frequency would tend to fire, for example when the wave is at baseline (0 degrees), although it may not fire every time the wave reaches this position. A neuron tuned to a different frequency will tend to fire at a different phase angle, such as when the wave is cresting (90 degrees). In both ears impulses produced by neurons tuned to the same frequency will lock to the same phase angle. But, depending when the signals reach the ears, the train of impulses generated in one ear may be delayed relative to the impulse train generated in the other ear.

Imagine you are exposed to a 400 Hz sound coming from the right (Figure 44a). At this frequency, one cycle of sound covers about 85 cm, which is more than the 20 cm distance between your ears. After the peak in sound wave passes the right ear, you must wait 0.6 ms, the time it takes the sound to travel 20 cm, before detecting the same peak in your left ear. Because the wavelength of the sound wave is much longer than the distance between your ears (85 cm versus 20 cm) you can reliably use the interaural delay in peaks in the wave to determine sound location. What about wavelengths that are shorter than the distance between the ears?

Continuous tones of frequencies above about 1500 Hz produce what are known as phase ambiguities. This is because a sinusoidal wave of 1500 Hz has a wavelength about equal to the width of the head. You can see in Figure 44b, that both ears will detect a peak in the sound wave at the same time. Clearly, the peaks detected at the ears are different (labelled 1 and 2 in the figure) but as far as the brain is concerned, there would be no phase difference and the sound would be perceived as coming from the front. Head movements may resolve this ambiguity to some extent. However when the wavelength is less than the path difference between the two ears, ambiguities increase; the same phase difference could be produced by a number of different source locations. Phase differences therefore only produce useful cues for frequencies below about 1500 Hz.

Figure 44
Figure 44 Interaural time differences and phase ambiguity. (a) The signal comes from the right and waveform features such as the peak numbered 1 arrive at the right ear (solid line) 0.6 ms before arriving at the left (dotted line). Because the wavelength is more than twice the head diameter, no confusion is caused by the other peaks in the waveform (peaks 0 and 2) and the signal is correctly perceived as coming from the right. (b) The signal again comes from the right but the wavelength is shorter than the head diameter. As a result every feature of cycle 2 arriving at the right ear has a corresponding feature from cycle 1 at the left ear. The listener mistakenly concludes that the source is directly in front

12.4 Interaural intensity differences

The brain has another process for localizing high-frequency sounds (above 1500 Hz): interaural intensity differences.

Activity

Where does processing of interaural intensity differences take place?

Answer

In the lateral superior olivary nucleus.

For any sound, there is a direct relationship between the direction that the sound comes from and the extent to which the intensity of the sound at the two ears differs. If the sound comes directly from the right, the sound will be lower in intensity in the left ear, if it comes from directly in front, the intensity at the two ears is the same and with sound coming from intermediate directions, there are intermediate intensity differences. Intensity differences between the ears can result from two factors: differences in the distance the sound must travel to the two ears and differences in the degree to which the head casts a sound shadow. The greater the sound shadow cast by the head, the greater the level difference between the ears. The extent of the sound shadow cast by the head depends on the frequency of the sound. Low-frequency sounds have a wavelength that is long compared to the size of the head. The sound therefore bends very well around the head and there is very little sound shadow cast. In contrast, high-frequency sounds have a wavelength that is short compared to the dimensions of the head. This means that the head casts a significant sound shadow.

Activity

What would be the consequences of the difference in sound shadow cast by the head in response to low-frequency and high-frequency sounds?

Answer

For low frequencies, the sound shadow cast by the head would cause a minimal difference in the intensity of sound at the two ears and so there would be little scope for using interaural intensity differences in order to localise the sound. For high frequencies, the sound shadow cast by the head would cause a significant difference in the intensity of the sound at the two ears and so facilitate the use of interaural intensity differences.

In fact, interaural differences in intensity are negligible at low frequencies, but may be as large as 20 dB at high frequencies (Figure 45).

Figure 45
Figure 45 Low-frequency tones are not affected by the listener's head, so the intensity of a 200 Hz tone is the same at both ears. High-frequency tones (e.g. 6000 Hz) are affected by the presence of the listener's head and result in an acoustic shadow that decreases the intensity of the tone reaching the listener's far ear

12.5 Localisation of sound in the vertical plane

Much of our ability to localise sound in the vertical plane is due to the shape of the outer ear, in particular the pinna. The pinnae provide a monaural cue to localisation. The bumps and ridges on the pinnae produce reflections, and delays between the direct path and the reflected path make vertical localisation possible. Vertical localisation is seriously impaired if the convolutions of the pinnae are covered.

12.6 Distance cues

There are two main cues available that allow us to judge the distance to a sound source. The first of these is the sound pressure level. Sound pressure level drops by 6 dB each time the distance that a sound travels doubles. In other words, if the sound pressure level of a sound is 60 dB SPL when its source is 1 m from you, then it will be 54 dB SPL if you move back another metre so that you are now 2 m away from its source. Therefore lower sound pressure levels indicate a greater distance. A second cue relates to the frequency of a sound. When a sound travels over a distance, the high-frequency components attenuate to a greater extent than the low-frequency components. This means that sounds that are further away tend to be richer in low frequencies and therefore have a lower pitch.

12.7 Summary of Section 12

For precise localisation of a sound source, binaural cues are required.

Two types of binaural cue are used to localise non-continuous sounds in the horizontal plane: interaural time differences, which are most efficient for low-frequency sounds (20–1500 Hz) and interaural intensity cues, which are important for high-frequency sounds (1500–20 000 Hz). The frequency responses in the superior olive reflect these differences. The medial superior olive includes neurons that are responsive to low-frequency inputs, while the cells of the lateral superior olive are most sensitive to high-frequency stimuli.

The mechanism involved for the detection of interaural time differences is believed to involve a series of delay lines and coincidence detectors.

For localisation of continuous tones, interaural phase differences are used.

Information about the location of a signal in the vertical plane is provided by the pinnae.

We are able to judge the distance to a sound source using cues related to the decay of the signal with distance. Both the SPL and the spectral components of a signal are dependent on the distance between the signal and the listener.

12.8 More revision questions

Question 12

  • (a) If two tones are broadcast through headphones at an intensity of 100 dB SPL, which will sound louder, a 100 Hz tone or a 1000 Hz tone? Why?

  • (b) How loud must a 100 Hz tone and a 1000 Hz tone be (in dB SPL) in order to have a loudness level of 50 phons?

Answer
  • (a) They will sound equally loud – they both fall on approximately the same equal loudness contour (see Figure 35).

  • (b) The 100 Hz tone must be about 68 dB SPL and the 1000 Hz tone about 52 dB SPL (Figure 35).

Question 13

  • (a) How are beats generated?

  • (b) How are they perceived?

Answer
  • (a) Beats occur when two tones of slightly different frequency are broadcast simultaneously. As the relative phase of two simultaneously applied tones changes continuously, so the tones alternately reinforce and cancel one another.

  • (b) They are perceived as a single tone with a pitch midway between the two tones but periodically varying in loudness: waxing and waning of loudness. The intensity varies at a rate equal to the frequency difference of the two tones.

Question 14

Why is it that when you play music softly that has been recorded at a fairly high level, you cannot hear the very high and very low frequencies?

Answer

When music is played loudly, above about 80 dB SPL, all tones from about 30 Hz to about 5000 Hz have about the same loudness (they fall on the same loudness contour – see Figure 35). However, when you turn the intensity of the music down, all frequencies do not sound equally loud. At 10 dB for example, frequencies below about 400 Hz (the bass notes) and those above about 8000 Hz (the treble notes) are inaudible. So, if you play the music softly, you won't hear the very low and very high frequencies.

Question 15

How does the use of masking experiments support the place code hypothesis for pitch perception?

Answer

When a test tone is played in the presence of a masking tone or in the presence of masking noise, our ability to hear the test tone is impaired. This is because according to the place code hypothesis there is overlap in the place of vibration on the basilar membrane of the test tone and the masker.

Question 16

Why are interaural time differences not very useful for localising high-frequency sounds?

Answer

For high frequencies, the wavelength of the sound is less than the distance between our ears. This means that the delay in the arrival of the sound at the two ears can create phase ambiguities.

Question 17

What properties of a sound determine its pitch and its timbre?

Answer

Pitch is mainly determined by the frequencies of the low numbered harmonics in a sound, whereas timbre is determined by which frequency regions have more energy – the relative intensities of different harmonics.

At this point you should read Hearing impairments: causes, effects and rehabilitation by David Baguley and Don McFerran attached below. This chapter shows how dysfuction at any level of the auditory system can lead to hearing impairment and discusses some possible treatments for hearing loss.

Click View Document  to open Hearing impairments: causes, effects and rehibilitation by David Baguley and Don McFerran

Keep on learning

Study another free course

There are more than 800 courses on OpenLearn for you to choose from on a range of subjects. 

Find out more about all our free courses.

Take your studies further

Find out more about studying with The Open University by visiting our online prospectus.

If you are new to university study, you may be interested in our Access Courses or Certificates.

What’s new from OpenLearn?

Sign up to our newsletter or view a sample.

For reference, full URLs to pages listed above:

OpenLearn – www.open.edu/ openlearn/ free-courses

Visiting our online prospectus – www.open.ac.uk/ courses

Access Courses – www.open.ac.uk/ courses/ do-it/ access

Certificates – www.open.ac.uk/ courses/ certificates-he

Newsletter ­– www.open.edu/ openlearn/ about-openlearn/ subscribe-the-openlearn-newsletter

Acknowledgements

Except for third party materials and otherwise stated (see terms and conditions), this content is made available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 Licence

Course image: Don McCullough in Flickr made available under Creative Commons Attribution-2.0 Licence.

Grateful acknowledgement is made to the following sources for permission to reproduce material in this course:

The following is contained in chapters 1, 2 and 6 from Signals and Perception: The fundamentals of Human Sensation, edited by David Roberts and published by Palgrave Press in association with The Open University. Copyright © (2002) The Open University. This publication forms part of an Open University course, SD329 Signals and Perception: The Science of the Senses.

Chapter 1 ‘The mechanics of hearing’: Jonathan Ashmore (University College London)

Figure 2: Geisler, C.D. (1988) From Sound to Synapse: Physiology of the Mammalian Ear, copyright © 1988 by C. Daniel Geisler. Used by permission of Oxford University Press; Figure 4: Science Museum/Science and Society Picture Library; Figures 5, 6: Rosowski, J.J. (1996) ‘Chapter 2: Models of External and Middle ear Function’, Hawkins, H.L. et al. (eds), Auditory Computation, Springer-Verlag; Figure 9a: courtesy of Professor Andy Forge, UCL Centre for Auditory Research, University College London; Figure 11: The Nobel Foundation.

Chapter 2 ‘The transformation of sound stimuli into electrical signals’: Robert Fettiplace (University of Wisconsin)

Figure 5a: Furness, D.N. et al, (1977) Proceedings of the Royal Society of London, vol. 264, 1997, The Royal Society of London; Figure 7: Reprinted from Hearing Research, vol. 24, Palmer, A.R. and Russell, I.J., Copyright (1986), with permission of Elsevier Science; Figure 8: Rose, J.E. Galambos, R. and Hughes, J.R. (1959) Bulletin of the Johns Hopkins Hospital, vol. 104, 1959, © John Hopkins Hospital. Reprinted by permission of The Johns Hopkins University Press; Figure 9: Reprinted from Hearing Research, 22, Kiang, N.Y.S. et al., Copyright (1986), with permission from Elsevier Science; Figure 10: Reprinted from Current Opinion in Neurobiology, Vol, Ashmore, J.F. and Kolston, P.J., Copyright (1994), with permission from Elsevier Science.

Chapter 6 ‘Hearing impairments: causes, effects and rehabilitation’ David Baguley (Addenbrooke’s Hospital, Cambridge) and Don McFerran (Essex County Hospital)

Figure 2: Courtesy of Roy F Sullivan, PhD, http://www.rcsullivan.com (no longer accessible); Figures 5, 6: reproduced with permission of Advanced Bionics; Figure 8: Tyler, R. Tinnitus Handbook, © 2000. Reproduced with permission of Delmar, a division of Thompson Learning.

Figure 4 (a) Picture by Mireille Lavigne-Rabillard, from ‘Promenade around the cochlea’, by R. Pujol, S. Blatrix, T.Pujol and V. Reclar-Enjalbert, () CRIC, University Montpellier;

Figure 7 (b) Copyright © Science Photo Library;

Figure 10 Bekesy, G. v. (1953) ‘Description of some mechanical properties of the organ of corti’, The Journal of the Acoustical Society of America, vol. 25, no. 4, July 1953, American Institute of Physics;

Figure 11 (b) Zemlin, W. R. (1981) Speech and Hearing Science: Anatomy and Physiology, 2nd edn, Prentice-Hall, Inc. Copyright © 1981, 1968 by Prentice-Hall, Inc., Englewood Cliffs, N.J. 07632;

Figure 14Yost, W. A. (2000) ‘Peripheral auditory nervous system and haircells’, Fundamentals of Hearing: An Introduction, 4th edition, Academic Press. Copyright © 2000 by Academic Press;

Figure 19 Kiang, N. Y. S. (1980) ‘Processing of speech by the auditory nervous system’, The Journal of the Acoustical Society of America, vol. 68, American Institute of Physics;

Figure 21 Kandel, E. R., Schwartz, J. H. and Jessell, T. M. (2000) Principles of Neural Science, 4th edition, McGraw-Hill. Copyright © 2000 by The McGraw-Hills Companies, Inc. All rights reserved;

Figure 23 (a) Rose, J. E., Hind, J. E., Anderson, D. J. and Brugge, J. F. (1971) ‘Some effects of stimulas intensity on response of auditory nerve fibers in the Squirrel Monkey’, Journal of Neurophysiology, 34, The American Physiological Society;

Figure 23 (b) Goldstein, E. B. (1999) Sensation and Perception , 5th edition, Brooks/Cole Publishing Company. Copyright © 1999 by Brooks/Cole Publishing Company. A division of International Thomson Publishing, Inc;

Figure 24 Adapted from Wever, E. G. (1949) Theory of Hearing , John Wiley & Sons, Inc. Copyright © 1949 John Wiley & Sons, Inc. Reprinted by permission;

Figure 27 Lindsay, P. H. (1972) Human Information Process: An Introduction to Psychology, Academic Press. Copyright © 1972 by Academic Press;

Figure 30 Bear, M. F., Connors, B. W. and Paradiso, M. A. (1996) Neuroscience: Exploring the Brain, Williams & Wilkins. Copyright © 1996 Williams & Wilkins;

Figure 34 Sivian, L. J. and White, S. D. (1933) ‘On minimum audible sound fields’,The Journal of the Acoustical Society of America, vol. 4, American Institute of Physics;

Figure 35 Fletcher, H. and Munson, W. A. (1933) ‘Loudness, its definition, measurement and calculation, The Journal of the Acoustical Society of America, vol. 5, October 1933, American Institute of Physics;

Figure 36 Stevens, S. S., Volkmann, J. (1940) ‘The relation of pitch to frequency: a revised scale’, The American Journal of Psychology, vol. 53, no. 3, July 1940, The University of Illinois Press;

Figure 37 Harris, J. D. (1952) ‘Pitch discrimination’, The Journal of the Acoustical Society of America, vol. 24, no. 6, November 1952, Americ Egan, J. P. and Hake, H. W. (1950) ‘On the masking pattern of a simple auditory stimulus’, The Journal of the Acoustical Society of America, vol. 22, no. 5, September 1950, American Institute of Physics;

Figure 42 (a) Zwicker, E. and Terhardt, E. (1974) Facts and Models in Hearing, Springer-Verlag. Copyright © by Springer-Verlag Berlin Heidelberg 1974;

Figure 42 (b) Goldstein, E. B. (1999) Sensation and Perception, 5th edition, Brooks/Cole Publishing Company. Copyright © 1999 by Brooks/Cole Publishing Company. A division of International Thomson Publishing, Inc;

Figure 43Bear, M. F., Connors, B. W. and Paradiso, M. A. (1996) Neuroscience: Exploring the Brain, Williams & Wilkins. Copyright © 1996 Williams & Wilkins.

Don't miss out:

If reading this text has inspired you to learn more, you may be interested in joining the millions of people who discover our free learning resources and qualifications by visiting The Open University - www.open.edu/ openlearn/ free-courses