4.3 Recording and analysing speech
Generally, sound waves are much more complicated than a wave made by humming a single note or the sound made by an electric toothbrush. For example, Figure 23 shows the waveforms generated by someone saying ‘yes’ followed by ‘no’. Let this be called Recording A.
Each of the bursts of sound shown in Figure 24 is another recording of a person saying either ‘yes’ or ‘no’. Let this be called Recording B. Comparing Recording B with Recording A, which of the bursts of sound in Recording B is ‘yes’ and which is ‘no’?
The left sound burst in Recording A has a long tail, presumably caused by the long ‘s’ sound at the end of ‘yes’. The right sound burst does not have this tail. In Recording B, the right sound burst has a tail but the left sound burst does not. From this, it can be guessed (correctly) that the left sound burst in Recording B is ‘no’ and the right sound burst is ‘yes’.
The fact that different words result in different wave patterns underlies the technology of speech recognition. This technology has evolved to a high performance level over the last half century, but it has overcome some formidable problems. For example, are one person’s speech patterns the same as another’s?
Return to Interactive 1 (which you should still have open in a separate tab) and spend five or ten minutes experimenting by making your own sounds. If you hum a low note, are you able to calculate its frequency? Can you distinguish the patterns when you say ‘yes’ and ‘no’? Are your ‘yes’ and ‘no’ wave patterns similar to those shown in Figure 23?
When you have finished, close the interactive.