In this section, we attempt to provide answers to the question 'how do psychologists use standardised tests to assess abilities'?
- What is the difference between tasks and standardised tests?
- How are standardised tests used?
- Do tests provide the correct information about a child’s ability?
- The development of standardised tests
- Testing, context and language
- Cultural appropriateness and ecological validity (and references)
It is common for people to describe the tasks that developmental researchers use with children as ‘tests’, but this term is properly used only for tools such as the British Picture Vocabulary Scale or Raven’s Progressive Matrices which are published materials, with manuals and scoring directions.
These are usually only available to appropriately qualified psychologists, who have been registered with the test supplier, the two most important of these suppliers being the Psychological Corporation and Harcourt. These suppliers will only issue test materials to bona fide researchers and practitioners, under license agreements which restrict their use to specified circumstances such as testing children in schools to assess their educational needs or as part of an institutional based research project, for example.
Such standardised tests are only published after extensive periods of development to establish their validity (i.e. they measure what they claim to), their reliability (i.e. results are consistent) and to produce norms based on data gathered from carefully sampled appropriate populations.
Psychological tests are not necessarily paper and pencil, although many are; they may include the use of toys, as in the Test of Pretend Play or a variety of objects, such as cups, blocks, crayons and boxes, as in the Bayley Scales of Infant Development. Such tests are commonly called ‘psychometrics’ because they aim to provide a measurement (or metric) of some psychological function(s).
Standardised tests can include tasks that the child has to complete, such as naming a series of pictures of objects or placing a set of rods in a row of holes in a plastic strip, but the term ‘task’ is a much broader one and basically refers to any activity that requires a child’s active engagement with some materials. The tasks that developmental researchers use may sometimes appear simple, but they are usually the result of a lot of trial and error piloting with children to produce tasks that are good at revealing children’s abilities.
Assessment is a broader term that refers to the way that many clinicians and researchers will use one or more tests or tasks, and their own observations as well, to form a general impression of a child’s ability, state of mind or other psychological aspect. The so-called ‘clinical method’ as used by Piaget and other researchers favouring this style is similar, in that a standard task is used as a starting point for exploring a child’s understanding, for example, by asking additional probe questions after the child has completed the task.
To ensure that the results of developmental research are reliable, it is important to ensure that tasks are administered to children consistently, in a standard way. Otherwise, we can’t be sure that variations in the way the tasks have been presented by the researchers haven’t biased the results of some children in one way and in other ways for other children.
It’s also important that research results can be replicable, that is, if they are conducted by another researcher, with a new sample of children, the results will be at least comparable.
It’s to avoid such problems that researchers typically draw up ‘protocols’ for tasks that they wish to use in research and develop manuals so that administration can be ‘manualised’ (to borrow a term from clinical psychology), that is, to follow a procedure as laid out in a manual accompanying the task materials.
In research, because of the practice of writing up results for publication, and so that other people can collect comparable data, it is especially important to be precise about how long a researcher should wait for a child to answer a question, for example, or how long a child should be given to get six pegs into six holes in a plastic strip.
Although it is not always the case, a researcher is usually interested in using a child’s performance in a test or task to infer the child’s underlying ability or other psychological attribute. Thus, for example, the British Ability Scales are not intended to just measure how good a child is at answering the BAS questions and performing on the tasks in the scales, but is primarily aiming to give an indication of a child’s underlying abilities (i.e. their competence).
In spite of this aim, and all the careful psychometric development work that goes on to ensure that tests and tasks are indeed validly tapping into some underlying aspect of a child’s psychological functioning, it is nevertheless always true that a child’s performance will never give a 100% accurate measure of their competence. For one thing, children’s performance will vary day-to-day, due to factors like tiredness or alertness, the way that the tester relates to the child and many other factors.
It also is sometimes the case that the manualising of a test has the effect of the assessor not being able to probe further about a child’s ability to answer a question even though they might suspect that the child knows the correct answer.
In such circumstances there is a compromise between being able to assess a child’s ability in relation to what other children of the same age do in exactly the same circumstances and providing an in depth assessment of one child’s abilities but not knowing how this relates to what other children are able to do in the same circumstances.
Much of developmental research is looking for differences between children in order to explore the factors that influence such variations. A task on which all children perform in exactly the same way, no matter what age they are or whatever their background, is unlikely to be of much interest to any researchers.
So one of the things that people designing tasks are concerned with is to make sure that children’s performance shows sufficient variation from one individual to another and sufficient variation between children of different ages.
An aspect that researchers have to bear in mind is to avoid what are called floor and ceiling effects. These effects arise when the developmental tasks do not allow sufficient range in children’s responses to differentiate between some of the children at the upper or lower extremes of performance.
Thus, if one in three children always scores 100% on a particular task, and another one in three scores 0%, then the task is only discriminating among the group getting scores between 1 and 99%, only one-third of the population. The ceiling effect is where a substantial number of children score at a maximum level, and hence it is not possible to distinguish between them in terms of their performance on the task. The floor effect refers to the group at the other end of the distribution, those children who all score 0% and hence cannot be distinguished between either.
An example of why this may be important is given by a case in which an intervention to improve the reading ability of poor readers might fail to show effects because the test used to assess reading ability has a marked floor effect. Thus, children whose reading ability genuinely improved as a result of the intervention might still score at or around zero because the test does not discriminate well between poor readers, although it may discriminate well between average and better than average readers.
Writing in Children's Minds, Margaret Donaldson suggested that when children are faced with tasks set them by researchers, they find such tasks much easier if they make some sort of human sense. In the ‘Three Mountains’ task, it can be seen that a child’s performance on a task cannot be easily separated from the context in which the task is located. Modern ideas of situated cognition (such as those put forward by WJ Clancey in the article Situated cognition: how representations are created and given meaning) stress this point.
In addition, it is easy to assume that children understand language in the same way as adults and thus they will find it as easy to follow instructions or to respond to questions as adults do. However, it often seems that children will place more reliance on non-verbal cues or the context of the question than would adults. In the assessment of conservation the context of the question can have a powerful effect
These factors can affect the validity of the assessment, and affect the relation between competence and performance. Good standardised tests attempt to avoid these problems.
A crucial consideration for the development and use of tests, tasks and assessments is raised by the recognition that cognitive development does not take place independent of its cultural context (as observed in N Warren's 1979 cultural variation and commonality in cognitive development), and that other aspects of development, social, emotional and even perceptual, are also dependent on the nature of children’s environments.
Recognising these issues, it is clearly critical that developmental research should ensure that the language, materials, setting and other aspects of data collection from children are appropriate to their cultural backgrounds. This is a difficult issue as tests are often designed to be administered in a specific way. However, in some circumstances particular tests work across different cultures.
P R Dasen's Are cognitive processes universal? pointed out that an African village child, given a short plastic tube and a chain of paper clips, who has never seen such things before, nevertheless makes exactly the same actions as a child in Paris, trying to pass the chain through the tube. So, clearly, we must not make assumptions about cultural specificity or universality without testing out whether they are justified or not.
A general term that sums this up is ecological validity. This points to the need to examine whether what is being asked of a child in a psychological task is valid for their ecology, in other words, their social and cultural milieu. To do so means stepping outside our own cultural and social frame and doing our best to overcome the tendency to ethnocentrism, that is, believing that the world as we experience it is a primary reference point and that other ways of being and seeing are deviations or aberrations from our own.
It is salutary to recognise that even something that seems so basic as the idea of a competition in which there are a winner and losers is an alien idea in some cultures, for example the Inuit of North America, where striving is valued greatly above winning.
Hence the response of an Inuit child in a psychological assessment that asks whether a child who has won a game is happy, might validly be that they are unhappy, based on the underlying premise that satisfaction comes from trying hard, not from winning. Would that Inuit child be any less ‘emotionally intelligent’ than a British child who says ‘of course the winner’s happy, because they won, and they didn’t even have to try very hard’?
Clancey, W. J. (1994) Situated cognition: how representations are created and given meaning, in Lewis, R. and Mendelsohn, P., (eds) Lessons from Learning, pp. 231-242, Amsterdam, North Holland.
Dasen, P. R. (1977) Are cognitive processes universal? A contribution to cross-cultural Piagetian psychology, in Warren, N. (ed.), Studies in Cross-cultural Psychology, vol. 1, pp. 155-201, London, Academic Press.
Donaldson, M. and Lloyd, P. (1974) Sentences and situations: children’s judgments of match and mismatch, in Bresson F. (ed.) Problèmes Actuels en Psycholinguistique, Paris, Presses Universitaires de France.
Donaldson, M. (1978) Children’s Minds, London, Fontana.
Warren, N. (1979) Cultural variation and commonality in cognitive development, in Oates, J. (ed) Early Cognitive Development, London, Croom Helm.
Methods of studying children
The collection of articles, videos, photos and audio exploring child development has been made possible by a partnership between the British Psychological Society and The Open University Child and Youth Studies Group.