Health Management, Ethics and Research: 12. Data Collection and Analysis for Your Baseline Community Survey: View as single page

Printable page generated Tuesday, 21 July 2026, 9:42 PM
Use 'Print preview' to check the number of pages and printer settings.
Print functionality varies between browsers.

Unless otherwise stated, copyright © 2026 The Open University, all rights reserved.
Printable page generated Tuesday, 21 July 2026, 9:42 PM

Health Management, Ethics and Research: 12. Data Collection and Analysis for Your Baseline Community Survey

Study Session 12 Data Collection and Analysis for Your Baseline Community Survey

Introduction

Study Sessions 10 and 11 have given you some background information about the community survey which you will be undertaking in your kebele. In this study session you will learn techniques of data collection and how to manage and analyse data.

You need to approach your community survey in a systematic and organised way. If data are collected haphazardly, they will be of little value to you or the community. The first step, before you start collecting data, is to plan your survey and prepare resources such as data collection forms. The forms and other records need to be standardised so that you collect information uniformly from all the respondents. This is particularly important if some of the data is being collected by volunteers in your community; you need to ensure they all follow the same procedures. The need for good organisation continues after the initial data collection stage; for example, the completed forms will need to be stored in an organised way (Figure 12.1).

Figure 12.1 Storing health service data in an organised way using files. (Photo: I-TECH/Julia Sherburne)

Learning Outcomes for Study Session 12

When you have studied this session, you should be able to:

12.1 Define and use correctly all of the key words printed in bold. (SAQs 12.1 and 12.2)

12.2 Describe various techniques for collecting data and state their uses and limitations. (SAQ 12.3)

12.3 Explain how bias can occur in data collection and how it can be avoided. (SAQs 12.1 and 12.4)

12.4 Describe basic concepts and procedures required for data analysis and interpretation. (SAQs 12.2 and 12.5)

12.5 Identify ethical issues involved in data collection as part of a community survey. (SAQ 12.6)

12.1 Collecting data

Data collection methods may vary according to whether you adopt a quantitative or qualitative approach. A quantitative approach to data collection usually uses structured questionnaires, while a qualitative approach uses unstructured interviews or discussions (see Section 12.1.2). If the purpose of the data collection is to assess how widespread a problem is, or how many people are affected by a disease, or if you want to use the data to describe a particular group of people, then you will need quantitative data. On the other hand, qualitative data may be more appropriate if your plan is to:

address an issue that is not well understood (e.g. people’s beliefs or perceptions)
provide a deeper understanding of an issue (such as how or why people are dying of HIV/AIDS)
ask the community members for their own perspectives and feedback.

You will also need to consider how the data will be processed, analysed and interpreted, otherwise collecting it will serve no purpose. Thinking about what you are going to do with the collected data before you start will help to ensure that nothing important is missed out. Other aspects to consider are how to fit the data collection into your work plan, whether there are cost implications and whether you have sufficient budget, and whether there might be any ethical considerations to address.

When you are planning your community survey, the first decision will involve the method of data collection to be used. Methods of collecting community survey data include:

observation
interviewing (face-to-face)
written questionnaires
focus group discussions.

This study session will introduce you to these methods of data collection.

12.1.1 Observation

Observation of human behaviour is a commonly used data collection technique; however, it is time consuming. It is most often used in small-scale surveys. The observation method of data collection simply means to gather information by your own direct observation without asking questions of the respondent. It is important to record your observations carefully using a checklist. The purpose of using a checklist is to make your observation as objective as possible, so that you note down what you see in a consistent way when you are observing different people.

12.1.2 Interviewing

Interviewing involves oral questioning of respondents, either individually or as a group. This is a face-to-face or personal interview method and requires a person, the interviewer, asking questions to the other person, the respondent. The questions are usually initiated by the interviewer who then records the responses, as shown in Figure 12.2.

Figure 12.2 Collecting data in an interview. (Photo: Yesim Tozan)

The collection of information through personal interviews is usually carried out in a structured way. Structured interviews involve the use of a set of predetermined questions in an interview schedule (list of questions) and use standard techniques of recording the respondent’s answers. These are usually written in a notebook (ideally a tape recorder would be used, but this is not always available). The interviewer asks the questions in a prescribed order and the respondent gives answers in their own words. The interviewer is allowed to ask ‘follow-up’ questions only if something the respondent says is not clear, or if the question wasn’t understood, but otherwise keeps to the questions on the interview schedule. An example of a possible structured interview question and a follow-up question are given below:

‘If you or a female relative is expecting a baby, would you prefer the labour and delivery to be at home or in the Health Post? Can you say why?’

Note that when presenting questions like the one above, it is important not to ‘prompt’ the respondent (i.e. suggest or hint at a possible answer) because this might influence their response. They may try to give you the answer they think you want to hear.

In contrast, unstructured interviews are characterised by a flexibility of approach to questioning. An example of an unstructured interview question is given below:

‘Please tell me about giving birth to your first child.’

In unstructured interviews you do not follow a system of pre-determined questions, but simply begin a conversation with the respondent on a particular topic. The respondent is free to explore the topic in their own words and in their own way, without being restricted by specific questions that must be answered. The interviewer can prompt the respondent to say more with phrases such as ‘Tell me more about that’ or ‘This is interesting – please go on’, but does not ask specific questions about the topic.

12.1.3 Written questionnaires

A written questionnaire is a data collection tool in which written questions are presented to be answered by the respondents in written form. The questions are directed towards collecting simple factual information, which can be answered either by writing a few words on the questionnaire, or ticking a box next to the chosen answer from a list of options. You can use this form of data collection in many different ways, for example:

Through mailing to respondents who are asked to post their responses back to you.
Gathering your respondents in one place at one time, giving oral or written instructions, asking them to fill out the questionnaires and collecting them when completed.
Delivering your questionnaires to the respondents by hand and collecting them later.

As with questions presented in interviews, the questions on a written questionnaire can be either structured or unstructured, but they are always simple to answer directly. Questionnaires do not usually seek complex information about people’s attitudes, beliefs or preferences, or explanations about why they behave in a certain way. (Complex information is best collected through interviews or focus groups.)

In a written questionnaire, the following question was asked:
From which of the following sources do you get your water? Tick all options that apply to you.
A  Well
B  River
C  Pond
D  Standpipe
E  Another source
Is this a structured or unstructured question?
It is a structured question because a rigid choice of answers is presented and the respondent must choose from them.

How would you ask this same question in an unstructured way? How are the answers recorded, and what further questions might this enable you to ask?
You could ask ‘Where do you get your water from?’ This is an unstructured question because there are no prepared responses already written down. The respondent either writes their answer in their own words on the questionnaire, or the interviewer writes it for them on the questionnaire. Further questions you may have thought of might include:
A  ‘How far do you have to go to collect your water?’
B  ‘How often do you collect water?’
C  ‘How long does it take you to collect water?’
The unstructured question therefore enables you to explore the respondent’s answer further. Note that all the questions require very simple factual answers, e.g. (in the example above) the answers might be:
A ‘Two kilometres’,
B ‘Once a day’,
C ‘Two hours’.

Table 12.1 summarises the advantages and disadvantages of the methods of collecting data that you have learned about so far.

Table 12.1 Advantages and disadvantages of different data collection techniques.
Technique	Advantages	Disadvantages
Observation	Gives detailed information in a particular context Permits collection of information which may not be appropriate to ask in a questionnaire	Ethical issues concerning confidentiality or privacy may arise Observer bias may occur (observer may only notice what interests him or her) The presence of the data collector can influence the situation being observed
Interviewing	Suitable for use with illiterate people Permits clarification of questions Has higher response rate than written questionnaires	The presence of the interviewer can influence responses Reports of events may be less complete than information gained through observations
Written questionnaires	Not expensive Permits anonymity and may result in more honest responses Is not labour-intensive, so does not require assistants Eliminates bias as questions are phrased in the same way for all respondents.	Cannot be used with illiterate respondents (unless they are helped) There is often a low rate of response Questions may be misunderstood

12.2 Focus group discussions

A focus group discussion is a loosely structured interview conducted by an experienced moderator with a small number of people who all sit together at the same time in the same place. For a focus group discussion the participants will be guided through an unstructured, spontaneous discussion on a particular topic. The information obtained is qualitative data.

12.2.1 Ideal characteristics of a focus group discussion

The ideal characteristics for a focus group are as follows:

The group consists of eight to twelve members.
The people in the group are similar in terms of demographics and socioeconomic factors (e.g. consisting of all women, or all adolescents, etc.) but are likely to have a range of different views.
Discussion generally lasts for 90 minutes to two hours.
The moderator has experience of the issues being discussed.
Conversation should be videoed and/or audio-taped, or notes taken.
Emphasis is on the interaction between the group members, rather than their individual perspectives.
The goal is not for everyone to reach agreement; instead, the aim is for the participants to reflect on the discussion topics, present their opinions, and respond to the comments of other group members.

12.2.2 The value of focus groups

Focus group discussions can offer an effective qualitative data collection method for a number of reasons. They are good for generating ideas; for example, they may act as a starting point for introducing a new product (e.g. condom) or discussion of ideas, uses or improvements. Focus group discussions can reveal community needs, perceptions and attitudes to health services that are currently provided. They can therefore be used to assess needs and gaps, and enable the service-provider team to rethink the way they operate in order to improve the service. The discussions can also be useful for evaluating programmes and guiding programme development.

The qualitative information obtained from focus group discussions is likely to be in the form of written or spoken text. The best way to analyse such information is generally to try to identify central concepts or themes which came out of the discussions. The qualitative information obtained from such discussions may complement data collected by quantitative methods.

12.3 Bias in data collection

If you ‘hand pick’ your study subjects when you are collecting data, then it is likely that you are introducing bias in your study. Bias in data collection is a distortion which results in the information not being truly representative of the situation you are trying to investigate. Sources of bias can be prevented by carefully planning the data collection process.

Can you think of a way that bias might be accidentally introduced into a survey?
In interviews, when you are asking questions, it is important not to prompt respondents into giving particular answers because this could introduce a source of bias.

To avoid bias you need to collect data as objectively as possible, for example, by using well-prepared questions that do not lead respondents into making a particular answer. If you are selecting a sample of people for your research (i.e. not including everyone) then you must ensure the sample is representative of the population or group you are studying. If you are using volunteers to help in collecting data, you should ensure that everyone is collecting and recording data in the same way and that they all understand the need to avoid prompting the respondents to particular answers.

Once you have collected your data, you are ready to start processing and analysing it.

12.4 Data processing and checking for errors

Data processing refers to recording or entering your data (e.g. on to a master sheet or computer), and data checking and correcting. You may be concerned about the quality of some of the data which has been collected. For example, some of your data will probably have been collected by the volunteers who are helping you and it is possible that some may not clearly understand the objective of the data collection, and may be recording it in different ways. It is important to check your data for consistency and missing values as you collect it, and once collected, check again for errors.

No matter how carefully the data have been collected, some errors are inevitable. Errors (mistakes) can result from incorrect reading of the data, incorrect reporting, incorrect filing or incorrect typing. In addition, the data entered may be incomplete (some of the data was never collected, or has been lost). The aim of the checking process is therefore to produce a reliable set of data that you can be confident is accurate for the purposes of your analysis.

Once the data has been checked for errors and completeness, all the answers of individual respondents are entered on a data master sheet. An example is shown in Table 12.2.

Table 12.2 Data master sheet showing individual answers to eight questions, Q 1 to Q 8.
Individual respondent no.	Q 1 Gender	Q 2 Ethnicity	Q 3 Age	Q 4 Education	Q 5 Marital status	Q 6 Occupation	Q 7 House type	Q 8 Water source
1	F	Oromo	35	Illiterate	Married	Farmer	Tukul	River
2	M	Tigre	67	7^th grade	Married	Merchant	Tukul	Protected well
3	M	Oromo	34	3^rd grade	Single	Farmer	Tukul	River
4	F	Amhara	33	Illiterate	Divorced	Farmer	Tukul	River
5	F	Wolayta	42	Illiterate	Single	Farmer	Tukul	Well
6	M	Sidama	23	5^thgrade	Widowed	Labourer	Tukul	Well
7	F	Hadiya	56	6^th grade	Widowed	Housewife	Corrugated iron	Protected spring

12.5 Data analysis

The data in Table 12.2 is for only seven people. Imagine how large the table would need to be for a whole community! Analysing data enables you to present information in a clearer and more useful way. Data analysis means describing and summarising your findings in an unbiased way. The results obtained from the analysis will not only help you to meet your community survey objectives, they will also enable you to:

Monitor and evaluate your activities and establish whether you have progressed as planned.
Assess the effect of your activities on the knowledge, perceptions, behaviour, and ultimately on the health, of the individuals within your community.
Share your results with interested stakeholders in your community and local government officials.

To analyse your data, you first need to identify the type of data you have. You may have collected quantitative or qualitative data. Qualitative data use names or descriptions to describe variables, while quantitative data usually use numbers. A variable is any measured characteristic or attribute that differs between different people, households, etc.

Give an example from Table 12.2 of quantitative data and an example of qualitative data.
An example of quantitative data would be the column listing the respondents’ age. An example of qualitative data would be ethnicity, occupation, house type or water source.

Several terms are used to describe types of variable. For some variables, called categorical variables, there are a limited number of possible responses that can be given, in other words, a limited number of categories. For example, ‘gender’ is a categorical variable because it has two categories: ‘male’ and ‘female’. Other variables, known as continuous variables, have lots of different possible responses, though usually within a certain range. For example, age is a continuous variable, within the range of a normal human lifespan.

Variables that are described by a number are, unsurprisingly, also known as numerical variables. For example, the number of new AIDS cases reported during a one-year period, the number of beds available in a particular hospital, or a person’s weight or temperature are all numerical variables.

Of the variables given in Table 12.2, gender is one categorical variable. Can you find another?
Another categorical variable would be marital status, because everyone can be categorised into single, married, divorced or widowed, or cohabiting (living together without being married).

‘Blood group’ is a variable. People may have one of four blood groups and these are A, B, AB and O. Is blood group a categorical or a continuous variable?
Blood group is a categorical variable because it has four categories. Each person has one of the four blood groups – A, B, AB or O.

At times, you may find it useful to transform numerical data into categorical data. You can do this by dividing the range of values of the variable into intervals, i.e. by grouping the data. For example, the numerical variable ‘age’ might be transformed into a categorical variable ‘age group’, which consists of categories such as under 30 years, 30–44, 45–59 and over 60 years. This transformation is useful if the researcher is interested in the number of people falling into each of these four categories (Figure 12.3).

Figure 12.3 The age group of community members is often used in surveys. (Photo: Janet Haresnape)

Suppose you find that the ages of a group of people you interviewed about tuberculosis in your kebele are as shown in Table 12.3. How many of these people would be in each of the age groups under 21, 21–30, 31–40, 41–50, 51–59 and over 60? Put your answers in Table 12.4a. Which age category has the most people in it?
Table 12.3 Ages of people interviewed about tuberculosis.
Age (years) Number of people
19 2
20 3
21 4
22 3
23 4
24 4
25 3
26 2
28 3
30 1
32 3
35 3
38 3
45 1
49 1
55 1
Table 12.4a Age groups of people surveyed about tuberculosis (for completion).
Age group (years) Number of people
under 21
21–30
31–40
41–50
51–59
over 60
Your completed table should look like Table 12.4b below. The age group with the most people in it is the 21–30 years category, with 24 people.
Table 12.4b Age groups of people surveyed about tuberculosis (completed).
Age group Number of people
under 21 5
21–30 24
31–40 9
41–50 2
51–60 1
over 60 0

Table 12.3 Ages of people interviewed about tuberculosis.
Age (years)	Number of people
19	2
20	3
21	4
22	3
23	4
24	4
25	3
26	2
28	3
30	1
32	3
35	3
38	3
45	1
49	1
55	1

Table 12.4a Age groups of people surveyed about tuberculosis (for completion).
Age group (years)	Number of people
under 21
21–30
31–40
41–50
51–59
over 60

Table 12.4b Age groups of people surveyed about tuberculosis (completed).
Age group	Number of people
under 21	5
21–30	24
31–40	9
41–50	2
51–60	1
over 60	0

12.6 Summarising quantitative data

We mentioned above that a complete set of raw (unanalysed) data from a whole community survey would be large and unmanageable. You need to summarise the findings so that they are useful to you and others. In this section, we will describe some of the most common methods for summarising quantitative data.

12.6.1 Frequencies

Frequency means the number of times an event occurs or the number of responses in a particular category. In other words, a frequency is a count of events in a given time frame. For example, if you report ‘Our Health Post sees 130 patients each month’, the frequency of patients seen is 130 per month.

Frequency data is often presented in tables, graphs or pie charts.

Suppose you find that in a particular area, 14 out of 25 adults aged under 30 years have had malaria, whereas 19 out of 25 adults between the ages of 30 and 50 years, and 20 out of 25 adults over 50 years, have had malaria. Present these data in the form of a table.

Your table should look something like Table 12.5.

Table 12.5 Age distribution of malaria cases from the imaginary example given above.
Age category	Number of people sampled	Number who have had malaria
over 50 years	25	20
30 to 50 years	25	19
under 30 years	25	14

12.6.2 Mean, median and mode

To summarise numerical variables, there are three measures that are commonly used: mean, median and mode. To explain how to proceed with these measures, let’s look at some examples.

The mean is the average of a series of measurements or scores. To calculate the mean, you add up all the individual measurements or scores and then divide this total by how many scores there are (it is the sum divided by the number of individual values). Although the mean is the most commonly used of the measures mentioned here, the median or the mode may sometimes be more appropriate. The median is a measure of central location, where half of the measures are below and the other half are above this value. The mode is the most common result (the most frequent value) of a test, survey or experiment.

For example, imagine a school exam taken by 10 students with possible scores from 0 to 100. Nine students score 95 but one person scores 5. The mean is calculated by adding up the total scores (9 × 95 + 5 = 860) and dividing by the number of scores (10), which gives a mean of 86. That one person with the low score really throws off the final statistic! The median, however, is 95 and in this case is a better description of how most people did in the exam. The mode would be the most common score which would also be 95 in this example. In this case the median or mode might be more useful than the mean.

Seven farmers in your kebele keep goats (Figure 12.4). You record how many goats each farmer has and the results are 8, 1, 3, 7, 1, 6 and 9. What is the mean number of goats owned by these farmers and what is the median number?
The mean is 5. It is the sum of the scores (35) divided by the number of farmers (7). If you put the numbers in order they are 1, 1, 3, 6, 7, 8, and 9. The middle value is 6 and therefore the median is 6.

Note the difference in the values between the mean and median. The mean or average can be influenced by extreme or outlying values at either end of the scale, but the median is not. If the number of values is even, there isn’t a middle value, so to calculate the median you take the mean of the two middle numbers.

For example, if there were only six farmers and the number of goats they owned were 10, 12, 14, 16, 18 and 20, the two middle numbers are 14 and 16, so the median is 14+16 divided by 2, which equals 15.

Figure 12.4 How many goats? (Photo: Janet Haresnape)

Supposing the numbers of goats owned by the seven farmers are 3, 4, 7, 7, 7, 9 and 10. What is the mode of the numbers of goats?
Looking at these scores, you can see that 7 is the most common number of goats because three farmers have 7 goats. The mode of the numbers (also referred to as the modal number) of goats is therefore 7.

Sometimes it is more appropriate to think about the modal number since this represents the most common situation.

12.6.3 Proportion and percentage

A proportion, sometimes called relative frequency, is simply the number of times the observation occurs in the data, divided by the total number of responses. Proportions are very often converted to percentage values because this makes comparison easier between different sets of data. Percentage means the number of occurrences or responses, as a proportion of the whole, multiplied by 100. For example, if 30 people respond to a survey out of a total of 100, the frequency of respondents is 30, the proportion is 30/100 or 0.3, and the percentage of respondents is 30%. If the total number of people surveyed was only 60 and there were 30 respondents, the proportion is 30/60 or 0.5, and the percentage of respondents is 50%.

Approximately what percentage of the seven people whose answers are summarised in Table 12.2 are illiterate?
Three of the seven people are illiterate. So the percentage of people who are illiterate is 3/7 × 100% which is approximately 43%.

Go back to Table 12.4b, which showed the number of people who have had malaria in different age categories, and add a column to show the percentage of each age category that have had malaria. Which age category has the highest percentage of people who have had malaria?

Your table should look something like Table 12.5 below.

Table 12.5 Age distribution of the number and percentage of people who have had malaria in an imaginary example.
Age group(years)	Number of people sampled	Number who have had malaria	Percentage who have had malaria
over 50	25	20	80%
30 to 50	25	19	76%
under 30	25	14	56%

To calculate the percentage, you take the number who have had malaria and divide it by the total number sampled, then multiply your answer by 100. For example, in Table 12.5, for those aged over 50 years, the calculation is 20 (who have had malaria), divided by 25 (people sampled) × 100%, which is 80%. This is the age group with the highest percentage of people who have had malaria.

Table 12.6 shows the percentage of women in each of four age groups in a certain population. It shows that more women fall in the age group 30–40 years than in any other category.

Table 12.6 Percentage of females by age group.
Age group	Number of women	Percentage of total
under 30	200	17
30–40	400	33
41–50	35	29
over 50	250	21
all ages	1200	100

In the example in Table 12.5, only 25 people in each age group were sampled. When reporting percentages, you should also always report how many observations there were. For example, if you say that 50% of women seen by the clinic this month had diabetes, it is important to know how many women were seen. If it is 50% of 500 women, this means that 250 women with diabetes were seen, but if it is 50% of two women, then only one woman with diabetes was seen!

12.6.4 Cumulative percentage

The cumulative percentage for a given category means the percentage of people who fall into that category, or a lower category. To work out the cumulative percentage for each category, you just have to add the percentage for that category to all of the percentages for the categories which are lower. Table 12.7 shows an example of cumulative percentages using the same data as in Table 12.6. It is a way of presenting the same data in a more descriptive way.

Table 12.7 Percentage and cumulative percentage of females by age group.
Age group	Number of women	Percentage of the total	Cumulative percentage
under 30	200	17	17
30–40	400	33	50
41–50	350	29	79
over 50	250	21	100
all ages	1200	100	100

12.7 Ethical considerations

You have learned about the ethical issues that you need to be aware of in your role as a Health Extension Practitioner in Study Sessions 7, 8 and 9. These issues must also be considered in the context of research. There are many established codes of practice that cover the ethics of research. These are codes that protect the rights of respondents either in research or in a community survey. Some of the widely accepted ethical principles include:

The study should be conducted appropriately and data analysed in an unbiased way.
Findings should be presented honestly; investigators should not fabricate data or distort their results.
Contributions of others should be acknowledged.
Investigators should not suppress unwanted findings.
Investigators should declare any conflicts of interest.

Furthermore, as we develop our data collection techniques, we need to consider whether our data collection procedures are likely to cause any physical or emotional harm. Harm may be caused, for example by:

Violating respondents’ right to privacy by posing sensitive questions or by gaining access to records which may contain personal data.
Allowing personal information that respondents would want to be kept private to be made public.
Failing to observe or respect certain cultural values, traditions or taboos valued by your respondents.

You will need to be aware of these ethical considerations when you collect data for your community survey or in other research, For example, in questionnaires, it may be advisable to omit names and addresses if sensitive questions are asked about such things as family planning or sexual practices, or about opinions of patients on the health services provided. Some other suggestions for dealing with difficult ethical considerations are:

Obtain informed consent from participants before the study or the interview begins.
Avoiding exploring sensitive issues until you have established a good relationship with the respondent.
Ensure that the data obtained is kept confidential (Figure 12.5).
Ensure that the culture of respondents is respected during the data collection process.

Figure 12.5 Personal information must be kept confidential. (Photo: Janet Haresnape)

Summary of Study Session 12

In Study Session 12, you have learned that:

Collecting data for a community survey or other purpose needs to be carefully planned.
The main methods of data collection are observation, interviews, questionnaires and focus group discussions. Each of these methods has different advantages and limitations.
Data may be quantitative or qualitative. Quantitative data is appropriate if you want to quantify a health problem or to quantify background information about your community. Qualitative data is appropriate if you want to find out more detail about a particular community health problem.
While collecting data it is important to avoid bias. Questionnaires used in a baseline community survey should be used in a standard way to ensure that the data are reliable.
Data must be checked for errors and completeness, during and after collection.
Data analysis means describing and summarising the findings so they can be presented in a way that is understandable and useful. It should enable you to compare one set of data with another in a meaningful way.
Quantitative data can be summarised and presented using methods such as frequency, mean, median, mode, proportion and percentages.
Consideration should be given to ethical issues as data is collected.

Self-Assessment Questions (SAQs) for Study Session 12

Now that you have completed this study session, you can assess how well you have achieved its Learning Outcomes by answering the following questions. Write your answers in your Study Diary and discuss them with your Tutor at the next Study Support Meeting. You can check your answers with the Notes on the Self-Assessment Questions at the end of this Module.

SAQ 12.1 (tests Learning Outcomes 12.1 and 12.3)

Explain what is meant by bias in the collection of data, and why it should be avoided.

Answer

Bias is a distortion of information during data collection. Biased data collection does not show the true situation that you are trying to investigate so should be avoided if possible.

SAQ 12.2 (tests Learning Outcomes 12.1 and 12.4)

In a survey of ten households, the numbers of children in each family were found to be:

3, 1, 6, 4, 0, 3, 3, 5, 8, 4.

a.What is the mean number of children per household?
b.What is the median number?
c.What is the modal number?
d.What proportion of households has more than three children?
e.What percentage of households has more than four children?

Answer

a.The mean number of children per household is 3.7. To calculate the mean you add up all the numbers of children, which comes to 37, and divide by the number of households, which is 10.
b.The median number is 3.5. To calculate the median you rearrange the data in order: 0, 1, 3, 3, 3, 4, 4, 5, 6, 8. In this case, because there are an even number of records, there is no middle number so you have to take a mean of the two middle numbers, which are 3 and 4.
c.The modal number is 3. This occurs three times whereas other numbers occur no more than twice.
d.The proportion of families with more than three children is 5 out of 10. You could simplify this to say half the families have more than three children.
e.Three families have more than four children so the percentage is 3 divided by 10, multiplied by 100, which equals 30%.

Now read Case Study 12.1 and then answer the questions that follow it.

Case Study 12.1 Nutritional problems of women and children

You suspect that a large proportion of women and children in your kebele are malnourished, in particular women of childbearing age. You would like to determine the extent of this problem, and whether women perceive it as a problem. Furthermore you would like to know whether the women themselves could contribute to improving their nutritional status and how they might do this.

SAQ 12.3 (tests Learning Outcome 12.2)

What data collection methods might be appropriate to collect data for this investigation?

Answer

The data required is qualitative because it includes the women’s perceptions and opinions. Interviews and focus group discussions with women could be used to collect this data. Written questionnaires can also be used however this will only be suitable if all of the women are literate.

SAQ 12.4 (tests Learning Outcome 12.3)

Describe some biases that could occur during collection of data on nutritional problems of women and children in a situation like the one described in Case Study 12.1. How could these biases be avoided?

Answer

If data are collected using interviews, then the questions would need to be well prepared and devised so they did not lead to particular answers. All interviewers would need to receive appropriate training to ensure that they record the answers in the same way. Bias could also occur if respondents are prompted when answering questions. Respondents should not be handpicked, but selected according to consistent criteria.

SAQ 12.5 (tests Learning Outcome 12.4)

What sort of checks should be done on the data which has been collected before it is analysed and interpreted?

Answer

It is important to check data for consistency and missing values. You should check for errors in order to ensure that the data are reliable before you start to analyse and interpret the data.

SAQ 12.6 (tests Learning Outcome 12.5)

What ethical issues might you encounter while collecting data on the nutritional problems of women and children in Case Study 12.1?

Answer

It would be important to establish a relationship with, and to obtain informed consent from, each mother before you start to ask a lot of questions. You would have to be aware that nutritional status might be a sensitive issue.

Health Management, Ethics and Research: 12. Data Collection and Analysis for Your Baseline Community Survey

Except for third party materials and/or otherwise stated (see terms and conditions) the content in OpenLearn is released for use under the terms of the Creative Commons Attribution-NonCommercial-Sharealike 2.0 licence. In short this allows you to use the content throughout the world without payment for non-commercial purposes in accordance with the Creative Commons non commercial sharealike licence. Please read this licence in full along with OpenLearn terms and conditions before making use of the content.

When using the content you must attribute us (The Open University) (the OU) and any identified author in accordance with the terms of the Creative Commons Licence.

The Acknowledgements section is used to list, amongst other things, third party (Proprietary), licensed content which is not subject to Creative Commons licensing. Proprietary content must be used (retained) intact and in context to the content at all times. The Acknowledgements section is also used to bring to your attention any other Special Restrictions which may apply to the content. For example there may be times when the Creative Commons Non-Commercial Sharealike licence does not apply to any of the content even if owned by us (the OU). In these stances, unless stated otherwise, the content may be used for personal and non-commercial use. We have also identified as Proprietary other material included in the content which is not subject to Creative Commons Licence. These are: OU logos, trading names and may extend to certain photographic and video images and sound recordings and any other material as may be brought to your attention.

Unauthorised use of any of the content may constitute a breach of the terms and conditions and/or intellectual property laws.

We reserve the right to alter, amend or bring to an end any terms and conditions provided here without notice.

All rights falling outside the terms of the Creative Commons licence are retained or controlled by The Open University.

Head of Intellectual Property, The Open University