Skip to main content
Printable page generated Monday, 4 March 2024, 8:39 PM
Use 'Print preview' to check the number of pages and printer settings.
Print functionality varies between browsers.
Unless otherwise stated, copyright © 2024 The Open University, all rights reserved.
Printable page generated Monday, 4 March 2024, 8:39 PM

Health Management, Ethics and Research Module: 15.  Sampling Methods and Sample Size in Small-Scale Research

Study Session 15  Sampling Methods and Sample Size in Small-Scale Research


In Study Session 14 you learned some of the basic study designs that you could use to start to do some small-scale health research in your community. In this study session you will learn about the next stages of doing research and why this may be relevant for your work as a Health Extension Practitioner. Of course your research should be able to find out lots of things about your entire community, but studying large numbers of people takes a lot of resources. It uses a lot of time and costs a lot of money. Studying the whole of the population of interest is therefore not usually possible. For this reason, researchers usually study a representative number of the whole population, which is called a sample. To achieve a representative sample for a research study, the people who will be studied (i.e. the subjects) have to be carefully selected using appropriate sampling methods.

In this study session you will learn about different types of sampling methods and how to determine the appropriate sample size (i.e. the number of subjects) needed to generate reliable results from your research. This will help you to extract information from any research on community health and healthcare interventions in your locality.

Learning Outcomes for Study Session 15

When you have studied this session, you should be able to:

15.1  Define and use correctly all of the key words printed in bold. (SAQs 15.1, 15.2, 15.3 and 15.4)

15.2  Identify and describe common probability and non-probability methods of sampling populations for research studies, and illustrate the reasons for choosing each method. (SAQs 15.1, 15.2, 15.3 and 15.4)

15.3  Decide on the best sampling method and the sample size appropriate for a study design in examples presented to you. (SAQs 15.1, 15.2, 15.3 and 15.4)

15.1  What is meant by sampling?

Sampling is the process of selecting a number of study subjects from a defined study population (i.e. the population being investigated). In most research projects it is not possible to include all the study population in the research design. Therefore, you need to look at a sample of individuals who will give you the necessary information that you can then apply to everyone in the study population. As you have already learned in Study Session 14, it is first necessary to define the study population being investigated and only then can you begin to think about how you might take a sample from it.

  • Why do you think that sampling may be necessary if you want to study health issues in your locality?

  • If it is not possible to study everybody in your locality, a representative sample of people has to be studied.

As you have learned in previous study sessions, study variables can be categorised as quantitative and qualitative, and the data you collect in a research study may also be categorised in this way. Your sampling methods should follow different techniques depending on whether the data is quantitative or qualitative. In this section you will learn about sampling methods for both types of data, and also how to avoid bias in the sampling process.

  • How might bias arise in data collection? (You may want to refer back to Study Session 12.)

  • Bias means that data is distorted in some way. This may happen if you are collecting data by interview and you prompt respondents to make particular answers. It can also happen if you ‘hand pick’ your study subjects, for example, by only choosing people who live nearby or people you know.

15.2  Why do you need a representative sample?

We turn now to consider why it is so important to make sure that your sample is representative of the study population. This is essential if you want to draw conclusions which are valid for the whole study population. This applies whenever you are conducting a quantitative survey, such as a cross-sectional, case-control or cohort study design. You can ensure that your sample is representative by using random selection of subjects from the population. Random sampling means sampling based on each individual in the population having the same chance (or probability) of being selected to be included in the sample. You will learn about probability sampling methods in Section 15.3.

You learned about cross-sectional, case-control and cohort studies in Study Session 14.

  • Why do you think that it is important to make sure that the sample that you study is representative of all the people in your locality?

  • If the sample is not representative then you will not be able to say anything about the rest of the community. You will just have studied a number of people from your community, but they will not necessarily represent the community as a whole.

For qualitative data it is not necessary to ensure that your sample is representative, because the purpose of the research is to learn about those individuals specifically, and their knowledge, beliefs and practices. Therefore, samples for qualitative research studies are usually selected using non-probability sampling methods (described in Section 15.4).

Next we will describe probability and non-probability sampling methods in detail, so you can see different methods of deciding how the members of your sample will be selected (sampling methods) and how many individuals must be included (sample size).

15.3  Probability sampling methods

Probability sampling involves using random selection procedures to ensure that each member of the sample is chosen on the basis of chance. All members of the study population should have an equal (or at least a known chance) of being included in the sample. For example, names drawn out of a hat or computer-generated random numbers are random selection methods. A probability sampling method is a process that protects your research from bias and ensures that you have a representative sample. Furthermore, it will help you to make meaningful statistical estimations when you analyse the results of your research.

  • Why do you think that random selection of people for your sample is so important?

  • If the sample selection is not done in a random manner, then the sample will not be representative of the rest of the community. It will probably include a bias in the selection, i.e. too many or too few people will be included who are not representative of the population as whole.

Probability sampling requires that a list of all study population members exists or can be compiled. This list is called the sampling frame. The following probability sampling methods will be discussed here:

  • Simple random sampling
  • Systematic sampling
  • Stratified sampling.

15.3.1  Simple random sampling

As the name suggests, simple random sampling is the simplest method of probability sampling. It means within a particular study population everyone has an equal chance of inclusion in the sample. It is considered ‘fair’ and therefore allows findings to be generalised to the whole population from which the sample was taken. It is sometimes called the ‘lottery method’ and is illustrated in Figure 15.1.

Figure 15.1  An example of simple random sampling of 10 subjects, represented by the red ‘stickmen’, selected at random from a total of 50 subjects using a lottery method. (Diagram: Jessica Aumann)

To use the simple random sampling method, it is necessary to have lists of all elements of the population to be studied. Therefore, to select a simple random sample you need to:

  • Make or search for an existing named or numbered list of all the members in the study population from which you want to take a sample.
  • Decide on the size of the sample you need (this will be discussed in Section 15.6).
  • Select the required number of subjects (also known as ‘sampling units’) using a lottery method so everyone has an equal chance of being selected. Case Study 15.1 illustrates one way of doing this.
Case Study 15.1  Selecting a simple random sample of students

A simple random sample of 25 students is to be selected from a school of 500 students (Figure 15.2). Using a list of all 500 students, each student is given a number (1 to 500), and these numbers are written on small pieces of paper. All the 500 papers are put in a box, after which the box is shaken vigorously to ensure randomisation. Then, 25 papers are taken out of the box, and the numbers are recorded. The students belonging to these numbers will constitute the simple random sample.

Figure 15.2  Random sampling of school students will provide a representative sample. (Photo: Pam Furniss)

15.3.2  Systematic sampling

In systematic sampling, individuals are chosen at regular intervals using a sampling frame to help you do this.You will recall from earlier in Section 15.2 that the sampling frame is a list of your entire study population. Figure 15.3 shows an example in which every tenth individual has been selected from the sampling frame i.e. the sampling interval is 10.

Figure 15.3  An example of systematic sampling of every tenth subject selected systematically from a total of 50 subjects. (Diagram: Jessica Aumann)

Ideally you should randomly select a number where you start selecting individuals from the list (in the example in Figure 15.3, the starting number is 1). Then, you will select your study subjects starting from that point, choosing whatever sampling interval seems appropriate. For example, you could decide to question people in every 10th house (sampling interval of 10), or selecting every 20th person (sampling interval of 20) from a list.

Systematic sampling is usually less time-consuming and easier to perform than simple random sampling. However, there is a risk of bias, as the sampling interval may accidentally coincide with a variation in the study population that you did not expect. For example, if you wanted to select a random sample of days on which to count Health Post attendance, systematic sampling with a sampling interval of seven days would be inappropriate.

  • Can you explain why?

  • All selected study days would fall on the same day of the week. If every study day was a Thursday, and Thursday is market day, then your study days would not be typical days and your sample would not be representative.

Case Study 15.2 presents an example of how to select a systematic sample. Read it carefully and then answer the question that follows it.

Case Study 15.2  A systematic sample of students

A systematic sample is to be selected from 1,200 students from the same school. The required sample size is 100. The study population is 1,200 and the sample size is 100, so a systematic sampling interval is found by dividing the study population by the sample size:

1,200 ÷ 100 = 12

The sampling interval is therefore 12.

The number of the first student to be included in the sample should be chosen randomly, for example by blindly picking one out of twelve pieces of paper, numbered 1 to 12. If number 6 is picked, then every twelfth student will be included in the sample, starting with student number 6, until 100 students have been selected.

  • What would the first four numbers selected be in the example in Case Study 15.2?

  • The first four numbers selected would be 6, 18, 30 and 42, because the first number is 6 and the next one is found by adding 12 to the previous number and so on.

15.3.3  Stratified random sampling

Sometimes you may need to include some small groups in your sample. For example, random or systematic sampling may not select enough individuals from under-represented subgroups for your research. How do you ensure that small groups are included in sufficient numbers to make the research valid? In such cases you may use a stratified sampling method. Stratified random sampling involves dividing your population into various subgroups and then taking a simple random sample within each group. This will ensure that your sample represents key subgroups of the population. Figure 15.4 illustrates this process.

Figure 15.4  An example of stratified sampling. (Diagram: Jessica Aumann)

The 50 subjects in Figure 15.4 have been stratified (divided) into two subgroups – one of 30 subjects (outlined in blue), and one of 20 subjects (outlined in green). A sample of 10 subjects has been selected, but they have not been picked entirely at random. Instead, 6 have been selected at random from the 30 blue subjects and 4 have been selected at random from the 20 green subjects, to ensure that the blue and green individuals are proportionately represented in the sample of 10 selected individuals.

Representation of the subgroups can be proportionate or disproportionate. For example, if you wanted to sample 100 farmers from a population of farmers in which 90% are male and 10% are female, a proportionate stratified sample would select 90 males and 10 females. But you may want to know more about the women farmers than is possible in a sample of only ten subjects. So you can select a disproportionate stratified sample, for example, you could select 50 males and 50 females. The example in Case Study 15.3 will help you to understand stratified sampling.

Case Study 15.3  Stratified sampling of households

A survey is conducted on household water supply in a district comprising 2,000 households, of which 400 (or 20%) are urban and 1,600 (or 80%) are rural. It is suspected that in urban areas the access to safe water sources is much more satisfactory than in rural areas (Figure 15.5). A decision is made to sample 200 households altogether, but to include 100 urban households and 100 rural households.

  • Is this sample a proportionate or a disproportionate stratified sample?

  • It is disproportionate. 100 urban households out of a total of 400 means that 1 in 4 (one quarter) of the urban households were included in the sample. 100 rural households out of a total of 1,600 means that only 1 in 16 rural households were sampled.

Figure 15.5  Rural water supply is frequently from unprotected sources. (Photo: Nancy Platt)

15.4  Non-probability sampling methods

Samples selected using non-probability sampling methods are not representative samples and their findings cannot be generalised to the whole study population from which the sample was taken. This is because the individuals in the sample are chosen by ‘hand picking and therefore the people in the study population do not each have an equal chance of being selected. You may use non-probability sampling methods for defined purposes, e.g. if you wanted to investigate knowledge, attitudes and practice regarding female genital mutilation (FGM) in your community. A non-probability type of sampling would be appropriate in this case because you would want to interview people who can shed light on this sensitive subject. The types of non-probability sampling methods that you might use in such a study include purposeful sampling, quota sampling and snowball sampling.

15.4.1  Purposeful sampling

Supposing you wanted to focus on a limited number of respondents who you had selected because you think their in-depth information will give good insight into an issue that little is known about. This would be a qualitative research study, and the appropriate method to use would be what is called purposeful sampling. Purposeful sampling involves the selection of a sample of individuals with a particular purpose in mind. Using purposeful sampling you would select subjects for specific reasons, such as:

  • They meet particular criteria of interest in your research, e.g. very poor compliers with anti-TB treatment; well-nourished children; women who use depo-provera for family planning, etc.
  • They show wide variations in their knowledge, attitudes or practice to a particular health issue, e.g. towards people living with HIV, or towards FGM or early marriage.
  • They have particular knowledge or expertise, e.g. traditional birth attendants or herbalists in your community.

Purposeful sampling can be very informative. However, this sampling method cannot produce results that can be generalised to the population as a whole, and it may be difficult to avoid personal bias or preference when you are selecting your sample. For example, supposing you want to investigate the use of particular herbs which are taken for pain relief in your kebele. You might decide to interview people who you know to be traditional herbalists. However, this might introduce a personal bias as you would only be able to ask those traditional herbalists who are known to you. Moreover, there may be herbs which are widely used in the community, but that are not recommended by the traditional herbalists. You would get a more accurate picture of the use of pain-relieving herbs in the community if you questioned a wider range of subjects – not just the herbalists.

15.4.2  Quota sampling

A quota is a defined number that must be included in a sample. Quota sampling is a method that ensures that a certain number of subjects from different subgroups with specific characteristics appear in the sample, so that all these characteristics are represented. For example, you may think that religion has a strong effect on attitudes toward family planning services, so you decide to include 25% of respondents from each of the four most common religious groups in your area (Orthodox Christians, Muslims, Protestants and Catholics) in your sample.

  • If you suspect that religion might influence use of condoms, then how might you apply quota sampling if you have just three main religious groups represented in the population of your kebele?

  • You would identify the religious groups and include a quota of 33% from each religious group in your sample, so that each group is equally represented.

15.4.3  Snowball sampling

Snowball sampling is often used when working with populations that are not easily identified or accessed. The process involves building up a sample through referrals. You start with one or two key individuals who you believe know a lot about the subject you are investigating, and you ask them if they know other people who also know a lot about the topic of interest (Figure 15. 6). You then ask those individuals for further recommendations of others who may know a lot about the topic, and so on. In this way a sizeable sample may be obtained. It is called ‘snowball’ sampling because when a small ball of snow begins to roll down a snow-covered hillside, it gathers more snow as it travels and gets larger and larger over time – just like the snowball sample in a research study.

Figure 15.6  Snowball sampling starts with one or two key individuals who then put you in touch with others. (Photo: Janet Haresnape)
  • Supposing you wanted to investigate a particular type of rare birth defect. You have only come across two mothers in your kebele who have children with this defect. How might you investigate this defect?

  • You could use snowball sampling for investigating this birth defect. You could start by interviewing the two mothers who are known to you, and ask them if they know of other mothers with children with the same birth defect. If they do, you could locate these mothers and ask them if they know of other who are similarly affected. In this way it may be possible to build up a sizeable number of mothers to include in your sample.

15.5  Census sampling

You learned in Study Sessions 10 and 14 that if a survey covers the total population it is called a census. The national census takes place in most countries every five or ten years, and includes some questions about the health status of the respondents. Such a census might involve asking questions about the:

  • total amount of illness in the population
  • amount of illness caused by a specified disease
  • nutritional status of the population
  • utilisation of existing healthcare facilities and demand for new ones
  • distribution in the population of particular characteristics, for example, breastfeeding or contraceptive practices.

The information obtained from a census might be used to examine the relationship between one or more of the factors investigated or the cause of a particular disease.

If you are planning a health survey in your community in which all members of the population are interviewed, this is called census sampling. It requires you to define the boundary that your study will cover (e.g. the boundary of your kebele) and then interview all the people within it. Census sampling can be used in organisations, schools and rural communities where boundaries may be easily defined.

  • Suppose you wanted to use census sampling to investigate the incidence of malnutrition in schoolchildren in your kebele. Who would you include in your investigation?

  • You would include all the children in each of the schools in your kebele.

15.6  Sample size

After defining your study population and identifying your study design, you may ask how many people you need to include in your sample. The answer to this question depends on the nature of your research and the type of data you intend to collect.

If you want to work with qualitative data, and your main objective is to find out more about a particular problem but without seeking to generalise your findings to the entire study population, then the size of your sample does not matter. But if you want to work with quantitative data from a sample of people, and you want to use the findings to generalise to the wider population, then it is best to use as large a sample as possible, within available time and cost constraints. The logic is that the larger the sample, the more likely it is to be representative of the entire population, and therefore more reliable for generalising your findings to the population as a whole.

To work out the appropriate sample size for your study, there are many statistical procedures that can be used. It is not the objective of this Module to introduce these procedures and you do not need to know them at this level. Nevertheless, you may use the information in Table 15.1 as a rough guide to calculate an appropriate sample size for an investigation of a particular study population.

Table 15.1  Rough guide showing the required sample size for a particular size of study population.
Study population size (number)Required sample size (95% confidence level)

The confidence level is an estimate of how certain you can be about the conclusions from your analysis, if the sample is the appropriate size. The recommended sample sizes given in Table 15.1 are those which give a confidence level of 95%, which means that in 95% of cases the sample you have selected will be large enough to be representative of the study population as a whole. Therefore, a 95% confidence level would enable you to have high confidence in the conclusions drawn from the results of your research on a sample of the specified size. This means that your sample would be representative and your findings could be generalised to the whole study population from which the sample was selected.

  • Supposing you have about 500 women who are past childbearing age in your kebele and you want to find out the average number of children they had during their childbearing years. You decide not to interview all 500 of these women, but to select a sample from this group to interview. Look at Table 15.1. Approximately how many women should you interview in order to be confident that the average number of children born to your sampled women is representative of the population as a whole?

  • You would need to interview around 217 women from the population of 500 to ensure that they are representative of the whole population of women past childbearing age.

In the final study session in this Module, you will find an extended case study that brings together many of the management, ethical and research issues you have met in this and earlier study sessions.

Summary of Study Session 15

In Study Session 15, you have learned that:

  1. Sampling is a process of selecting study subjects from a defined population.
  2. To generalise results from a sample to a whole study population it is necessary for the sample to be representative.
  3. Representativeness of a sample can be ensured by using probability sampling methods. This uses random selection of study subjects and ensures an equal chance (or probability) of any subject being selected in the sample for the research study.
  4. Commonly used probability sampling methods are simple random sampling, systematic random sampling and stratified random sampling.
  5. Non-probability sampling methods are used to select a sample of study subjects for qualitative research into their knowledge, attitudes and practices. This method uses some form of selection criteria and the results cannot be generalised to the population as a whole.
  6. If a sampling method includes all of the population within a defined boundary it is called a census and the sampling method is called census sampling.
  7. In determining the appropriate sample size for a research study you should consider whether you need to be able to generalise from the findings and what type of data you plan to collect. The sample size must be large enough to be representative of the population as a whole if the research is quantitative and the sample has been chosen using a probability sampling method. The sample size is not relevant if the research is qualitative and the sample has been chosen by a non-probability sampling method.

Self-Assessment Questions (SAQs) for Study Session 15

Now that you have completed this study session, you can assess how well you have achieved its Learning Outcomes by answering the following questions. Some of the questions address Learning Outcomes from Study Session 14 as well as those from Study Session 15. Write your answers in your Study Diary and discuss them with your Tutor at the next Study Support Meeting. You can check your answers with the Notes on the Self-Assessment Questions at the end of this Module.

SAQ 15.1 (tests Learning Outcomes 15.1, 15.2 and 15.3)

Look again at Case Study 14.3 in the previous study session and then answer the following questions:

  • a.What is meant by the word ‘sampling’?
  • b.Describe what type of sampling method you will use to select your study subjects for the research described in Case Study 14.3.
  • a.Sampling is the process of selecting a number of study subjects from a defined study population.
  • b.Simple random sampling or systematic random sampling methods could be used to select mothers and children for the research outlines in Case Study 14.3.

Now read Case Study 15.4, and then answer the questions that follow it.

Case Study 15.4  Sofia is concerned about the rise in low birthweight babies

Sofia is a Health Extension Worker serving in a village where there are approximately 400 mothers who have given birth in the past five years. She suspects that the number of low birthweight babies is increasing and she would like to know more about the physical and socioeconomic conditions of the mothers to see if corrective actions should be taken. The Health Post records at present are not complete enough to draw conclusions and she does not have enough time or resources to interview every mother in the kebele. But she would like to study the problem, so she decides to take a sample of mothers to investigate.

SAQ 15.2 (tests Learning Outcomes 15.1, 15.2 and 15.3)

  • a.What does Sofia need to consider before she starts her research project?
  • b.What list does she need to have before she can select her sample?
  • a.Sofia needs to consider how many mothers she will investigate (the sample size) and what questions she will ask them.
  • b.Before she can select her sample, she needs a list of all the mothers in the kebele. This list is her sampling frame from which she can take her sample.

SAQ 15.3 (tests Learning Outcomes 15.1, 15.2 and 15.3)

  • a.What type of sampling method and what sample size is appropriate for the study Sofia wants to conduct in Case Study 15.4?
  • b.Using your knowledge from Study Session 14, what type of study design should Sofia use?
  • a.The appropriate sampling method is stratified random sampling, because she would like to know more about the physical and socioeconomic conditions of the mothers who have given birth in the past five years. The appropriate sample size would be close to 200 mothers from the total population of 400.
  • b.She could use a cross-sectional study design based on a random stratified sample, because she has neither the time nor the resources to interview all 400 recent mothers.

SAQ 15.4 (tests Learning Outcomes 15.1, 15.2 and 15.3)

Imagine you want to find out about contraceptive use by unmarried girls aged under 18 years in your kebele. However, you do not have an accurate listing of all the girls in this age group and you suspect that very few are using contraception.

  • a.How might you obtain a large enough sample of girls who are using contraception?
  • b.What ethical considerations would need to be taken into account in a study of this subject?
  • a.You would try to identify one or two girls in this age-group who are using contraception and ask them if they would agree to be interviewed by you, and if they know of any other girls in a similar situation who can be referred to you. Then you would ask the girls they referred to you if they know about anyone else who could be interviewed for your study, and so on. Using this snowball sampling method could build up a large enough sample to generate useful results about contraceptive use among young unmarried women.
  • b.You would have to ensure absolute confidentiality about who you interviewed and what they reported to you about such a sensitive subject as contraception among unmarried girls.