Skip to content
Skip to main content

About this free course

Download this course

Share this free course

Designing the user interface: text, colour, images, moving images and sound
Designing the user interface: text, colour, images, moving images and sound

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

1.6.5 Using speech to good effect

Speech output is a powerful way of communicating information. It has particular benefits for the visually impaired. For those whose eyesight is good, speaking lifts may seem a novelty, but they provide useful information and reassurance for the visually impaired. Some applications of the technology have less obvious benefits, such as supermarket checkouts that read out the product and prices. These were found to breach the customer's sense of privacy and to be noisy. Again, good design depends upon a good understanding of the users and the environment in which the technology is going to be used. Box 2 describes an interesting speech-based system, but how many customers would want their financial details broadcast to the high street?

One of the benefits of good-quality speech output over text is that it communicates tone of voice, pace and accent. In this way it provides more information and helps to make the speaker seem more real to the user. The tone of voice can differ according to the content of the message: a warning message could sound urgent and an information message could sound reassuring.

Box 2: Advanced automatic teller machines (ATMS)

Over more than 30 years, bank ATMs have transformed the way in which we carry out banking transactions. However, the basic interaction technology has remained unchanged, using a numeric keypad and some buttons arranged around a small monochrome screen. More recently, this has been augmented by sound output indicating when an action needs to be taken, such as removing the card from the machine.

This approach has worked very successfully, but there have been ongoing problems with security. In particular, stolen cards can be used in the machines if the PIN number is known. One way around this is the use of iris recognition. In such systems, the user is required to look into a camera. The camera then takes a photograph of the iris, the coloured area around the pupil, and analyses the complex patterns. These patterns are unique to an individual and hence allow the system to identify the user.

A system known as STELLA is being piloted by NCR in Canada. In addition to allowing for iris recognition, this also allows for speech recognition and generation. Thus, the user walks up to the machine and stands on a pressure sensitive mat that indicates her presence. Having carried out the iris recognition, the user speaks to the system and the system provides the information available. Thus there is no keypad or screen.

Drawn from NCR Press Release: 21 June 1999

Michaelis and Wiggins (1982) suggest that speech output is most effective when the following conditions are met:

  • message is simple

  • message is short

  • message will not be referred to later

  • message deals with events in time

  • message requires an immediate response

  • visual channels are overloaded

  • environment is too brightly lit, too poorly lit, subject to severe vibration or otherwise unsuitable for transmission of visual information

  • user is free to roam around

These guidelines assume that the output is digitised human speech or playbacks of tape recordings.

(See Shneiderman, 1988)