2.3 Operant conditioning
According to behaviourism, all behaviour is learned and maintained by its consequences. B. F. Skinner (1905–1990) devised apparatus and methods for studying these effects. Figure 3 shows a ‘Skinner Box’ designed for use with a rat. The early behaviourists often examined animal learning and then extrapolated it to human learning. This was because they proposed that the fundamental principles of learning underpin the learning of all species.
The animal in the box can choose to behave in a variety of ways. The box contains a lever that delivers a food pellet when pressed. Initially, while moving about in the box the animal discovers by accident that when the lever is pressed, food appears. Over time the rate at which the lever is pressed by the animal increases, and other behaviours decrease by comparison. This suggests that the animal has learned to associate pressing the lever with the appearance of food. In Skinner's terminology, the lever-pressing behaviour was reinforced, that is, the consequences of pressing the lever made lever pressing more likely to occur in the future. When the lever pressing resulted in an unpleasant experience, such as an electric shock, then lever-pressing behaviour would occur less often. This is an example of punishment. (Punishment is an environmental stimulus that results in a decrease in a given behaviour.) The important point to remember is that reinforcement always refers to something that increases the frequency of a given behaviour, whereas punishment always refers to something that reduces the frequency of a given behaviour. ‘Punishment’ is therefore used here as a technical term with a precise meaning that differs from its everyday meaning.
Reinforcement, an environmental stimulus that results in an increase in a given behaviour, has both positive and negative forms. The terms ‘positive’ and ‘negative’ refer to the presentation or removal of an environmental stimulus. So, for example, ‘positive reinforcement’ refers to the presentation of a stimulus that increases the occurrence of a behaviour. ‘Negative reinforcement’ refers to an increase in a behaviour following the removal of an unpleasant (‘aversive’) stimuli (e.g. if a child increases the frequency of ‘room-cleaning behaviour’ because it results in the removal of parental disapproval).
Punishment can take one of three forms. ‘Positive punishment’ refers to the presentation of an unpleasant stimulus that will decrease the occurrence of the behaviour it follows. ‘Time-out’ is where a child is isolated from a reinforcing stimulus in their environment, with the aim of producing a decrease in the target behaviour. Finally, ‘response cost’ is where a penalty is applied every time an undesired behaviour is produced, again resulting in a decrease in that behaviour. The penalty, may be, for example, the removal of ‘tokens’ – items that are valued by the person, such as reward stickers or money. Table 1 summarises reinforcement and punishment.
Table 1: Reinforcement and punishment
|Positive reinforcement||Positive stimulus presented||Behaviour increases|
|Negative reinforcement||Aversive stimulus removed||Behaviour increases|
|Positive punishment||Aversive stimulus presented||Behaviour decreases|
|Time-out||Isolation from reinforcer||Behaviour decreases|
|Response cost||For example token removed||Behaviour decreases|
As with classical conditioning, extinction can occur if the behaviour is no longer reinforced. However, it should be noted that extinction is usually preceded by an extinction burst, which is a period of increased production of a previously reinforced behaviour following the withdrawal of that reinforcement.
Activity 1 Understanding punishment and reinforcement terms
This activity will help you to understand the meaning of the different types of reinforcement and punishment.
Read each statement below and identify which ones are examples of (a) positive reinforcement, (b) negative reinforcement, (c) positive punishment, (d) time-out and (e) response cost.
Getting burned when touching a hot pan, and never doing it again.
Getting a gold star for neat handwriting, and increasing your attempts to write neatly.
Watching your parents walk away when you are having a tantrum, and eventually calming down to run after them.
Stopping hitting your brother after you have a favourite toy taken away every time you hit him.
Having not had the opportunity to eat all day, you are eating a large chocolate bar, and then stop having eaten three-quarters of it.
The important thing to note in all these examples is what happened to the person's behaviour in relation to the environmental change, as it is the actual effect on behaviour that defines something as reinforcing or punishing. So, being burned in (1) is an example of positive punishment, as the presence of the burning sensation reduced the future incidence of the behaviour. (2) is an example of a positive reinforcement, as being given the star increased the production of neat writing. (3) is an example of time-out: the removal of parental attention resulted in reduced tantrum behaviour. (4) is an example of response cost – the favourite toy is systematically removed every time the undesired behaviour was produced. (5) is an example of negative reinforcement. Your hunger (an aversive stimulus) is removed by eating three-quarters of the chocolate bar.
However, ideally we should consider all these behaviours over time. For example, in (5) if your future consumption of chocolate decreased, then your ‘chocolate-eating behaviour’ was punished (eating three-quarters of a bar of chocolate may have made you feel unwell). If this behaviour increased in future then it was reinforced – either negatively (by reducing hunger) or positively (because you love chocolate!). This highlights one of the difficulties in identifying reinforcers and punishers in practice: they are defined by their outcomes, which may vary from individual to individual. For example, what is ‘reinforcing’ for one person may be ‘aversive’ for another.
In addition to reinforcement and punishment, Skinner examined the effect that different schedules of reinforcement have on the production of a behaviour: does it matter if a reward or punishment is not presented every time a behaviour is produced? (A schedule of reinforcement is the frequency and/or regularity of a given reinforcement or punishment in a setting.) Of particular significance is the predictability of the environment: the more unpredictable the pattern of reinforcement or punishment, the more resilient the behaviour will be to extinction. Consider the example of a child who has learned to expect a gold star every time she produces good work; as soon as the stars stop appearing she will quickly become de-motivated. However, if she learns that she occasionally gets gold stars for good work, she will be more likely to sustain good work in the expectation that she will, eventually, get a star again.