In Study Session 14 you learned some of the basic study designs that you could use to start to do some small-scale health research in your community. In this study session you will learn about the next stages of doing research and why this may be relevant for your work as a Health Extension Practitioner. Of course your research should be able to find out lots of things about your entire community, but studying large numbers of people takes a lot of resources. It uses a lot of time and costs a lot of money. Studying the whole of the population of interest is therefore not usually possible. For this reason, researchers usually study a representative number of the whole population, which is called a sample. To achieve a representative sample for a research study, the people who will be studied (i.e. the subjects) have to be carefully selected using appropriate sampling methods.
In this study session you will learn about different types of sampling methods and how to determine the appropriate sample size (i.e. the number of subjects) needed to generate reliable results from your research. This will help you to extract information from any research on community health and healthcare interventions in your locality.
When you have studied this session, you should be able to:
15.1 Define and use correctly all of the key words printed in bold. (SAQs 15.1, 15.2, 15.3 and 15.4)
15.2 Identify and describe common probability and non-probability methods of sampling populations for research studies, and illustrate the reasons for choosing each method. (SAQs 15.1, 15.2, 15.3 and 15.4)
15.3 Decide on the best sampling method and the sample size appropriate for a study design in examples presented to you. (SAQs 15.1, 15.2, 15.3 and 15.4)
Sampling is the process of selecting a number of study subjects from a defined study population (i.e. the population being investigated). In most research projects it is not possible to include all the study population in the research design. Therefore, you need to look at a sample of individuals who will give you the necessary information that you can then apply to everyone in the study population. As you have already learned in Study Session 14, it is first necessary to define the study population being investigated and only then can you begin to think about how you might take a sample from it.
Why do you think that sampling may be necessary if you want to study health issues in your locality?
If it is not possible to study everybody in your locality, a representative sample of people has to be studied.
As you have learned in previous study sessions, study variables can be categorised as quantitative and qualitative, and the data you collect in a research study may also be categorised in this way. Your sampling methods should follow different techniques depending on whether the data is quantitative or qualitative. In this section you will learn about sampling methods for both types of data, and also how to avoid bias in the sampling process.
How might bias arise in data collection? (You may want to refer back to Study Session 12.)
Bias means that data is distorted in some way. This may happen if you are collecting data by interview and you prompt respondents to make particular answers. It can also happen if you ‘hand pick’ your study subjects, for example, by only choosing people who live nearby or people you know.
We turn now to consider why it is so important to make sure that your sample is representative of the study population. This is essential if you want to draw conclusions which are valid for the whole study population. This applies whenever you are conducting a quantitative survey, such as a cross-sectional, case-control or cohort study design. You can ensure that your sample is representative by using random selection of subjects from the population. Random sampling means sampling based on each individual in the population having the same chance (or probability) of being selected to be included in the sample. You will learn about probability sampling methods in Section 15.3.
You learned about cross-sectional, case-control and cohort studies in Study Session 14.
Why do you think that it is important to make sure that the sample that you study is representative of all the people in your locality?
If the sample is not representative then you will not be able to say anything about the rest of the community. You will just have studied a number of people from your community, but they will not necessarily represent the community as a whole.
For qualitative data it is not necessary to ensure that your sample is representative, because the purpose of the research is to learn about those individuals specifically, and their knowledge, beliefs and practices. Therefore, samples for qualitative research studies are usually selected using non-probability sampling methods (described in Section 15.4).
Next we will describe probability and non-probability sampling methods in detail, so you can see different methods of deciding how the members of your sample will be selected (sampling methods) and how many individuals must be included (sample size).
Probability sampling involves using random selection procedures to ensure that each member of the sample is chosen on the basis of chance. All members of the study population should have an equal (or at least a known chance) of being included in the sample. For example, names drawn out of a hat or computer-generated random numbers are random selection methods. A probability sampling method is a process that protects your research from bias and ensures that you have a representative sample. Furthermore, it will help you to make meaningful statistical estimations when you analyse the results of your research.
Why do you think that random selection of people for your sample is so important?
If the sample selection is not done in a random manner, then the sample will not be representative of the rest of the community. It will probably include a bias in the selection, i.e. too many or too few people will be included who are not representative of the population as whole.
Probability sampling requires that a list of all study population members exists or can be compiled. This list is called the sampling frame. The following probability sampling methods will be discussed here:
As the name suggests, simple random sampling is the simplest method of probability sampling. It means within a particular study population everyone has an equal chance of inclusion in the sample. It is considered ‘fair’ and therefore allows findings to be generalised to the whole population from which the sample was taken. It is sometimes called the ‘lottery method’ and is illustrated in Figure 15.1.
To use the simple random sampling method, it is necessary to have lists of all elements of the population to be studied. Therefore, to select a simple random sample you need to:
A simple random sample of 25 students is to be selected from a school of 500 students (Figure 15.2). Using a list of all 500 students, each student is given a number (1 to 500), and these numbers are written on small pieces of paper. All the 500 papers are put in a box, after which the box is shaken vigorously to ensure randomisation. Then, 25 papers are taken out of the box, and the numbers are recorded. The students belonging to these numbers will constitute the simple random sample.
In systematic sampling, individuals are chosen at regular intervals using a sampling frame to help you do this.You will recall from earlier in Section 15.2 that the sampling frame is a list of your entire study population. Figure 15.3 shows an example in which every tenth individual has been selected from the sampling frame i.e. the sampling interval is 10.
Ideally you should randomly select a number where you start selecting individuals from the list (in the example in Figure 15.3, the starting number is 1). Then, you will select your study subjects starting from that point, choosing whatever sampling interval seems appropriate. For example, you could decide to question people in every 10th house (sampling interval of 10), or selecting every 20th person (sampling interval of 20) from a list.
Systematic sampling is usually less time-consuming and easier to perform than simple random sampling. However, there is a risk of bias, as the sampling interval may accidentally coincide with a variation in the study population that you did not expect. For example, if you wanted to select a random sample of days on which to count Health Post attendance, systematic sampling with a sampling interval of seven days would be inappropriate.
Can you explain why?
All selected study days would fall on the same day of the week. If every study day was a Thursday, and Thursday is market day, then your study days would not be typical days and your sample would not be representative.
Case Study 15.2 presents an example of how to select a systematic sample. Read it carefully and then answer the question that follows it.
A systematic sample is to be selected from 1,200 students from the same school. The required sample size is 100. The study population is 1,200 and the sample size is 100, so a systematic sampling interval is found by dividing the study population by the sample size:
1,200 ÷ 100 = 12
The sampling interval is therefore 12.
The number of the first student to be included in the sample should be chosen randomly, for example by blindly picking one out of twelve pieces of paper, numbered 1 to 12. If number 6 is picked, then every twelfth student will be included in the sample, starting with student number 6, until 100 students have been selected.
What would the first four numbers selected be in the example in Case Study 15.2?
The first four numbers selected would be 6, 18, 30 and 42, because the first number is 6 and the next one is found by adding 12 to the previous number and so on.
Sometimes you may need to include some small groups in your sample. For example, random or systematic sampling may not select enough individuals from under-represented subgroups for your research. How do you ensure that small groups are included in sufficient numbers to make the research valid? In such cases you may use a stratified sampling method. Stratified random sampling involves dividing your population into various subgroups and then taking a simple random sample within each group. This will ensure that your sample represents key subgroups of the population. Figure 15.4 illustrates this process.
The 50 subjects in Figure 15.4 have been stratified (divided) into two subgroups – one of 30 subjects (outlined in blue), and one of 20 subjects (outlined in green). A sample of 10 subjects has been selected, but they have not been picked entirely at random. Instead, 6 have been selected at random from the 30 blue subjects and 4 have been selected at random from the 20 green subjects, to ensure that the blue and green individuals are proportionately represented in the sample of 10 selected individuals.
Representation of the subgroups can be proportionate or disproportionate. For example, if you wanted to sample 100 farmers from a population of farmers in which 90% are male and 10% are female, a proportionate stratified sample would select 90 males and 10 females. But you may want to know more about the women farmers than is possible in a sample of only ten subjects. So you can select a disproportionate stratified sample, for example, you could select 50 males and 50 females. The example in Case Study 15.3 will help you to understand stratified sampling.
A survey is conducted on household water supply in a district comprising 2,000 households, of which 400 (or 20%) are urban and 1,600 (or 80%) are rural. It is suspected that in urban areas the access to safe water sources is much more satisfactory than in rural areas (Figure 15.5). A decision is made to sample 200 households altogether, but to include 100 urban households and 100 rural households.
Is this sample a proportionate or a disproportionate stratified sample?
It is disproportionate. 100 urban households out of a total of 400 means that 1 in 4 (one quarter) of the urban households were included in the sample. 100 rural households out of a total of 1,600 means that only 1 in 16 rural households were sampled.
Samples selected using non-probability sampling methods are not representative samples and their findings cannot be generalised to the whole study population from which the sample was taken. This is because the individuals in the sample are chosen by ‘hand picking’ and therefore the people in the study population do not each have an equal chance of being selected. You may use non-probability sampling methods for defined purposes, e.g. if you wanted to investigate knowledge, attitudes and practice regarding female genital mutilation (FGM) in your community. A non-probability type of sampling would be appropriate in this case because you would want to interview people who can shed light on this sensitive subject. The types of non-probability sampling methods that you might use in such a study include purposeful sampling, quota sampling and snowball sampling.
Supposing you wanted to focus on a limited number of respondents who you had selected because you think their in-depth information will give good insight into an issue that little is known about. This would be a qualitative research study, and the appropriate method to use would be what is called purposeful sampling. Purposeful sampling involves the selection of a sample of individuals with a particular ‘purpose’ in mind. Using purposeful sampling you would select subjects for specific reasons, such as:
Purposeful sampling can be very informative. However, this sampling method cannot produce results that can be generalised to the population as a whole, and it may be difficult to avoid personal bias or preference when you are selecting your sample. For example, supposing you want to investigate the use of particular herbs which are taken for pain relief in your kebele. You might decide to interview people who you know to be traditional herbalists. However, this might introduce a personal bias as you would only be able to ask those traditional herbalists who are known to you. Moreover, there may be herbs which are widely used in the community, but that are not recommended by the traditional herbalists. You would get a more accurate picture of the use of pain-relieving herbs in the community if you questioned a wider range of subjects – not just the herbalists.
A quota is a defined number that must be included in a sample. Quota sampling is a method that ensures that a certain number of subjects from different subgroups with specific characteristics appear in the sample, so that all these characteristics are represented. For example, you may think that religion has a strong effect on attitudes toward family planning services, so you decide to include 25% of respondents from each of the four most common religious groups in your area (Orthodox Christians, Muslims, Protestants and Catholics) in your sample.
If you suspect that religion might influence use of condoms, then how might you apply quota sampling if you have just three main religious groups represented in the population of your kebele?
You would identify the religious groups and include a quota of 33% from each religious group in your sample, so that each group is equally represented.
Snowball sampling is often used when working with populations that are not easily identified or accessed. The process involves building up a sample through referrals. You start with one or two key individuals who you believe know a lot about the subject you are investigating, and you ask them if they know other people who also know a lot about the topic of interest (Figure 15. 6). You then ask those individuals for further recommendations of others who may know a lot about the topic, and so on. In this way a sizeable sample may be obtained. It is called ‘snowball’ sampling because when a small ball of snow begins to roll down a snow-covered hillside, it gathers more snow as it travels and gets larger and larger over time – just like the snowball sample in a research study.
Supposing you wanted to investigate a particular type of rare birth defect. You have only come across two mothers in your kebele who have children with this defect. How might you investigate this defect?
You could use snowball sampling for investigating this birth defect. You could start by interviewing the two mothers who are known to you, and ask them if they know of other mothers with children with the same birth defect. If they do, you could locate these mothers and ask them if they know of other who are similarly affected. In this way it may be possible to build up a sizeable number of mothers to include in your sample.
You learned in Study Sessions 10 and 14 that if a survey covers the total population it is called a census. The national census takes place in most countries every five or ten years, and includes some questions about the health status of the respondents. Such a census might involve asking questions about the:
The information obtained from a census might be used to examine the relationship between one or more of the factors investigated or the cause of a particular disease.
If you are planning a health survey in your community in which all members of the population are interviewed, this is called census sampling. It requires you to define the boundary that your study will cover (e.g. the boundary of your kebele) and then interview all the people within it. Census sampling can be used in organisations, schools and rural communities where boundaries may be easily defined.
Suppose you wanted to use census sampling to investigate the incidence of malnutrition in schoolchildren in your kebele. Who would you include in your investigation?
You would include all the children in each of the schools in your kebele.
After defining your study population and identifying your study design, you may ask how many people you need to include in your sample. The answer to this question depends on the nature of your research and the type of data you intend to collect.
If you want to work with qualitative data, and your main objective is to find out more about a particular problem but without seeking to generalise your findings to the entire study population, then the size of your sample does not matter. But if you want to work with quantitative data from a sample of people, and you want to use the findings to generalise to the wider population, then it is best to use as large a sample as possible, within available time and cost constraints. The logic is that the larger the sample, the more likely it is to be representative of the entire population, and therefore more reliable for generalising your findings to the population as a whole.
To work out the appropriate sample size for your study, there are many statistical procedures that can be used. It is not the objective of this Module to introduce these procedures and you do not need to know them at this level. Nevertheless, you may use the information in Table 15.1 as a rough guide to calculate an appropriate sample size for an investigation of a particular study population.
Study population size (number) | Required sample size (95% confidence level) |
---|---|
30 | 28 |
100 | 80 |
500 | 217 |
1,000 | 278 |
5,000 | 357 |
10,000 | 370 |
50,000 | 381 |
100,000 | 383 |
1,000,000 | 384 |
The confidence level is an estimate of how certain you can be about the conclusions from your analysis, if the sample is the appropriate size. The recommended sample sizes given in Table 15.1 are those which give a confidence level of 95%, which means that in 95% of cases the sample you have selected will be large enough to be representative of the study population as a whole. Therefore, a 95% confidence level would enable you to have high confidence in the conclusions drawn from the results of your research on a sample of the specified size. This means that your sample would be representative and your findings could be generalised to the whole study population from which the sample was selected.
Supposing you have about 500 women who are past childbearing age in your kebele and you want to find out the average number of children they had during their childbearing years. You decide not to interview all 500 of these women, but to select a sample from this group to interview. Look at Table 15.1. Approximately how many women should you interview in order to be confident that the average number of children born to your sampled women is representative of the population as a whole?
You would need to interview around 217 women from the population of 500 to ensure that they are representative of the whole population of women past childbearing age.
In the final study session in this Module, you will find an extended case study that brings together many of the management, ethical and research issues you have met in this and earlier study sessions.
In Study Session 15, you have learned that:
Now that you have completed this study session, you can assess how well you have achieved its Learning Outcomes by answering the following questions. Some of the questions address Learning Outcomes from Study Session 14 as well as those from Study Session 15. Write your answers in your Study Diary and discuss them with your Tutor at the next Study Support Meeting. You can check your answers with the Notes on the Self-Assessment Questions at the end of this Module.
Look again at Case Study 14.3 in the previous study session and then answer the following questions:
Now read Case Study 15.4, and then answer the questions that follow it.
Sofia is a Health Extension Worker serving in a village where there are approximately 400 mothers who have given birth in the past five years. She suspects that the number of low birthweight babies is increasing and she would like to know more about the physical and socioeconomic conditions of the mothers to see if corrective actions should be taken. The Health Post records at present are not complete enough to draw conclusions and she does not have enough time or resources to interview every mother in the kebele. But she would like to study the problem, so she decides to take a sample of mothers to investigate.
Imagine you want to find out about contraceptive use by unmarried girls aged under 18 years in your kebele. However, you do not have an accurate listing of all the girls in this age group and you suspect that very few are using contraception.