Introduction

B126_4 Data analysis: hypothesis testing About this free course This free course is an adapted extract from the Open University course B126 Business data analytics and decision making - www.open.ac.uk/courses/modules/b126. This version of the content may include video, images and interactive content that may not be optimised for your device. You can experience this free course as it was originally designed on OpenLearn, the home of free learning from The Open University – There you’ll also be able to track your progress via your activity record, which you can use to demonstrate your learning.

First published 2025. Unless otherwise stated, copyright © 2025 The Open University, all rights reserved. Intellectual property Unless otherwise stated, this resource is released under the terms of the Creative Commons Licence v4.0 http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en. Within that The Open University interprets this licence in the following way: www.open.edu/openlearn/about-openlearn/frequently-asked-questions-on-openlearn. Copyright and rights falling outside the terms of the Creative Commons Licence are retained or controlled by The Open University. Please read the full text before using any of the content. We believe the primary barrier to accessing high-quality educational experiences is cost, which is why we aim to publish as much free content as possible under an open licence. If it proves difficult to release content under our preferred Creative Commons licence (e.g. because we can’t afford or gain the clearances or find suitable alternatives), we will still release the materials for free under a personal end-user licence. This is because the learning experience will always be the same high quality offering and that should always be seen as positive – even if at times the licensing is different to Creative Commons. When using the content you must attribute us (The Open University) (the OU) and any identified author in accordance with the terms of the Creative Commons Licence. The Acknowledgements section is used to list, amongst other things, third party (Proprietary), licensed content which is not subject to Creative Commons licensing. Proprietary content must be used (retained) intact and in context to the content at all times. The Acknowledgements section is also used to bring to your attention any other Special Restrictions which may apply to the content. For example there may be times when the Creative Commons Non-Commercial Sharealike licence does not apply to any of the content even if owned by us (The Open University). In these instances, unless stated otherwise, the content may be used for personal and non-commercial use. We have also identified as Proprietary other material included in the content which is not subject to Creative Commons Licence. These are OU logos, trading names and may extend to certain photographic and video images and sound recordings and any other material as may be brought to your attention. Unauthorised use of any of the content may constitute a breach of the terms and conditions and/or intellectual property laws. We reserve the right to alter, amend or bring to an end any terms and conditions provided here without notice. All rights falling outside the terms of the Creative Commons licence are retained or controlled by The Open University. Head of Intellectual Property, The Open University Introduction This course covers the fundamental principles of developing hypotheses. You will learn to distinguish between hypotheses that are directional and non-directional. This knowledge is crucial for the statistical testing that you will learn to implement. You will also explore z-tests and t-tests, two fundamental statistical tools essential for data analysis in business contexts. By understanding the differences between these tests and learning how to apply them effectively, you will develop the skills to assess data-driven claims with confidence.

The course will focus on the characteristics and applications of z-tests and t-tests. You will learn when to use each test based on sample size, population parameters, and the nature of the data. This knowledge will enable you to choose the appropriate test for various business scenarios, ensuring accurate analysis and interpretation of results. To reinforce your learning, you will encounter practical examples and exercises throughout the materials. These hands-on activities will allow you to practice applying z-tests and t-tests to real-world business problems, helping you to solidify your understanding and gain confidence in using these statistical methods Please note: this course will require the use Microsoft Excel or a similar program. This OpenLearn course is an adapted extract from the Open University course B126 Business data analytics and decision making. Learning outcomes After studying this course, you should be able to: understand the principle of hypothesis testing understand the idea of alpha in hypothesis testing differentiate between one-tailed and two-tailed tests understand hypothesis testing of means and proportions report the exact p-value of a test. 1 Hypothesis testing In the realm of business decision-making, data-driven approaches have become increasingly crucial. At the heart of this process lies hypothesis testing, a powerful statistical method that allows us to evaluate assumptions about population parameters based on sample data. This method provides a structured approach to making decisions about whether to reject these assumptions or not, offering a solid foundation for informed business choices. Hypothesis testing serves a vital purpose in the business world. Its primary aim is to determine whether the observed data provides sufficient evidence to support or refute a specific claim about the population. This process is invaluable in various business applications, from evaluating the effectiveness of marketing strategies to assessing product quality. By employing hypothesis testing, business leaders can move beyond gut feelings and intuition, basing their decisions on rigorous statistical analysis.

1.1 Two types of hypothesis At the core of hypothesis testing are two fundamental concepts: the null hypothesis and the alternative hypothesis. These form the backbone of the testing process and guide our interpretation of the results. The null hypothesis is denoted as H₀. It serves as the default assumption, positing that there is no relationship between variables or that a population parameter equals a specified value. For instance, if we are testing whether a new training program improves employee productivity, the null hypothesis might state that the program has no effect on productivity. On the other hand, the alternative hypothesis, represented as H₁ or H_a, is the statement that the decision-maker aims to demonstrate. It suggests that there is indeed an effect or a difference. In our training program example, the alternative hypothesis would state that the program does improve productivity. To illustrate these concepts further, let us explore some examples that showcase the interplay between null and alternative hypotheses: For centuries, the prevailing belief in astronomy was the geocentric model, which posited that the Earth was the centre of the universe. This theory, championed by ancient Greek philosophers like Aristotle and later refined by Ptolemy, served as the null hypothesis of its time. This belief persisted until the 16th century when Nicolaus Copernicus proposed a heliocentric model, suggesting that the Sun, not the Earth, was at the centre of the solar system. This revolutionary idea formed the alternative hypothesis. H₀: All planets orbit around the Earth. H₁: Not all planets orbit around the Earth.

The gold standard, a monetary system where the value of a country’s currency is directly linked to gold, was widely adopted in the late 19th and early 20th centuries. This system represented a null hypothesis in economic theory. However, the Great Depression of the 1930s severely tested this hypothesis. As economic conditions worsened, many countries found the gold standard limited their ability to implement expansionary monetary policies to combat the depression. This led to the formulation of an alternative hypothesis. H₀: The value of paper money is equal to a fixed amount of gold. H₁:The value of paper money is not equal to a fixed amount of gold. The United Kingdom abandoned the gold standard in 1931, followed by the United States in 1933. The economic recovery that followed in these countries provided evidence supporting the alternative hypothesis, leading to a fundamental shift in monetary policy. For much of modern history, the value of money has been based on trust in central banks and governmental authorities. This forms our current null hypothesis in monetary theory. However, the advent of cryptocurrencies, particularly Bitcoin in 2009, has challenged this notion. Bitcoin operates on a decentralised network, independent of any central authority. This new form of currency presents an alternative hypothesis. H₀: The value of paper money is equal to people’s trust in central banks or monetary authorities. H₁:The value of paper money is not equal to people’s trust in central banks or monetary authorities. While cryptocurrencies have gained significant traction, their role in the global financial system is still evolving. This ongoing "experiment" continues to test our hypotheses about the nature of money and value.

1.2 Business context Let us delve deeper into these practical business scenarios to illustrate how hypothesis testing can be applied in real-world situations. Consider a market research scenario where Apple is evaluating the pricing strategy for its latest iPhone model. A respondent in a focus group states that ‘Apple iPhones are too expensive’. While this statement provides some insight, it lacks the specificity required for rigorous testing. However, if the respondent specifies that an Apple iPhone costing over £500 is too expensive, we can formulate a testable hypothesis: H₀: The price of an Apple iPhone at £500 is not considered expensive by consumers. H₁: The price of an Apple iPhone at £500 is considered expensive by consumers.

To test such a hypothesis, Apple’s market research team might design a comprehensive survey to gather data from a representative sample of consumers. They could pose questions about price point perceptions, purchase intentions at specific price levels, and how the £500 price tag compares to consumer expectations. The subsequent analysis of survey responses using statistical methods would provide evidence either supporting or refuting the alternative hypothesis. The outcomes of this hypothesis test could significantly influence Apple’s pricing strategy, potentially leading to price adjustments, enhanced value propositions, or product line segmentation to offer more affordable options. As another example, if we believe that the average annual salary in the UK is approximately £26,000, that can be considered as the null hypothesis, given that we are in agreement with the belief. Thus, we have: H₀: The average annual salary in the UK is equal to £26,000. H₁: The average annual salary in the UK is not equal to £26,000. Testing these hypotheses would involve collecting salary data from a representative sample of UK workers through large-scale surveys, analysis of government data, or information from job postings and recruitment agencies. Once the data is collected, decision-makers would use statistical tests (we will introduce them later) to determine whether we would reject the null hypothesis. By formulating clear and testable hypotheses, businesses can design appropriate tests, gather relevant data, and draw meaningful conclusions. This process enables leaders to make informed decisions based on statistical evidence rather than assumptions or intuitions. For instance, in the iPhone pricing scenario, Apple could use the results to fine-tune their pricing strategy for different markets, develop marketing messages that address price perceptions, and inform product development decisions to align with consumer value expectations. In the salary investigation case, businesses could adjust their compensation packages to attract and retain talent, benchmark their salaries against industry standards, and forecast labour costs more accurately for financial planning. As you progress in your studies and career, remember that mastering the art of hypothesis formulation and testing is crucial for effective business decision-making. It allows you to ask the right questions, design robust analyses, and interpret results accurately. In today’s data-driven business landscape, the ability to formulate and test hypotheses can provide a significant competitive advantage, enabling you to uncover insights that drive innovation and success. Whether you are launching a new product, entering a new market, or optimising internal processes, embracing this scientific approach to decision-making will equip you to navigate the complexities of modern business and drive your organisation towards data-informed success. Activity 1: Null hypothesis versus alternative hypothesis Allow around 10 minutes for this activity. Read the following statements. Can you develop a null hypothesis and an alternative hypothesis? ‘It is believed that a high-end coffee machine produces a cup of caffè latte with an average of 1 cm of foam. The hotel employee claims that after the machine has been repaired, it is no longer able to produce a cup of caffè latte with 1cm foam.’ H₀: The coffee machine makes a cup of caffè latte with 1cm foam on average. H₁: The coffee machine does not make a cup of caffè latte with 1cm foam on average. If you have developed the hypotheses H₀ and H₁ as mentioned above, you have shown that you are familiar with the structure of different types of hypotheses.

1.3 Hypothesis formulation While these hypotheses demonstrate a basic understanding of hypothesis structure, we can refine them further to align more closely with statistical conventions. In hypothesis formulation, we typically express the null hypothesis (H₀) in terms of the population mean (µ), which represents the average value in the population. The symbol µ is commonly used to denote this population mean. To illustrate this concept, let us revisit our earlier example of the average UK salary. The widely accepted belief that the average UK salary is £26,000 per year can be expressed statistically as: H₀: µ = £26,000 Here, µ represents the population mean salary. The alternative hypothesis (H₁), which challenges this belief, can be expressed as: H₁: µ ≠ £26,000 This formulation clearly shows that H₁ is the opposite of H₀, reflecting its purpose of challenging the widely held belief represented by the null hypothesis. Applying this more precise formulation to our caffè latte foam example, we can express the hypotheses as: H₀: µ = 1cm foam H₁: µ ≠ 1cm foam In this case, µ represents the population mean for the height of foam in a caffè latte. The null hypothesis states that this mean is exactly 1cm, while the alternative hypothesis suggests a departure from this widely accepted belief. Note that H₀ and H₁, when taken together, cover all possible outcomes regarding the foam height. This approach to hypothesis formulation offers several advantages in business decision making. Firstly, it provides a clear, quantifiable statement that can be tested statistically. Secondly, it allows for precise measurement and analysis, enabling businesses to make data-driven decisions with confidence. Finally, it sets the stage for more advanced statistical analyses, such as determining the significance of any observed differences from the hypothesised mean.

2 Alpha (α) levels The process of hypothesis testing is a fundamental aspect of scientific research and statistical analysis. It provides a structured approach to evaluate claims and make decisions based on empirical evidence. Let us explore the two possible outcomes of hypothesis testing in more detail: Rejecting the Null Hypothesis (H₀) When decision-makers reject the null hypothesis, it implies that there is sufficient evidence to support the alternative hypothesis (H₁). This outcome occurs when the observed data is statistically significant and unlikely to have occurred by chance alone. Failing to Reject the Null Hypothesis (H₀) When decision-makers fail to reject the null hypothesis, it means that there is insufficient evidence to support the alternative hypothesis (H₁). This outcome occurs when the observed data is not statistically significant and could have occurred by chance. To illustrate these concepts, let us return to our coffee example. Suppose we want to test the null hypothesis that the average foam height in a caffè latte is 1 cm (H₀: µ = 1 cm foam). We might randomly sample 60 cups of caffè latte throughout the day, measure the foam height, and calculate the average and test statistic.

Consider a study where three decision-makers each sample 60 cups of caffè latte: Decision-maker 1 finds an average foam height of 1.1 cm. Decision-maker 2 finds an average foam height of 1.5 cm. Decision-maker 3 finds an average foam height of 2.6 cm. Decision-maker 1’s result (1.1 cm) is close to the null hypothesis value. In this case, we might fail to reject H₀, as the observed difference could be due to random variation. However, this does not prove that the true average foam height is exactly 1 cm. It only indicates that we do not have enough evidence to conclude otherwise. Decision-maker 3’s result (2.6 cm) is far from 1 cm. Here, we would likely reject H₀. This suggests that the observed difference is statistically significant and unlikely to have occurred by chance. Decision-maker 2’s result (1.5 cm) is less clear-cut. Although the average foam height of 1.5 cm is not far away from 1 cm, does it go far enough to be considered sufficient? In order to answer this question, you would need to introduce the concept of ‘statistical significance’, which you will look at in more detail in the next section.

2.1 Statistical significance We need to introduce the concept of ‘statistical significance’ in order to answer this question. The level of statistical significance is the threshold at which we decide whether to reject the null hypothesis (to make a decision). A result is statistically significant when the situation described in the null hypothesis is highly unlikely to have occurred. By utilising the concept of statistical significance, we can have a concrete way of examining the claim concerning the null hypothesis, using the data we have collected, to make a clear decision on when to reject the null hypothesis and when not to. In this way, we do not have to guess whether the test statistic is too high or too low. How can we determine the appropriate level of statistical significance to use? Determining the appropriate level of statistical significance is a crucial step in hypothesis testing. To understand this process, we need to introduce the concept of “level of confidence” or “confidence level”. These terms are used interchangeably in statistics and research methodology. The level of confidence represents how certain we are in our decision to reject the null hypothesis. The level of confidence is closely related to confidence intervals. Typically, decision-makers set confidence level at 95% when establishing a confidence interval. Once we have determined confidence level, we can calculate the level of statistical significance as 1 – level of confidence. For example, if we choose a 95% level of confidence, the level of statistical significance would be: 1 – 95% = 5% or 0.05 We refer to this level of statistical significance as “alpha” or “α”. Let us apply this concept to our coffee example. Decision-maker 2 found that the average foam height was 1.5 cm, which differs slightly from the expected value of 1 cm. The question is whether this difference is significant enough to reject or fail to reject the null hypothesis (H₀). To make this decision, decision-makers must consider the alpha level. The alpha level serves as a threshold for statistical significance. It helps determine whether the observed results are likely due to chance or represent a genuine difference between the expected and observed values. Once decision-makers determine the alpha level, they can use it to decide whether to reject or fail to reject the null hypothesis based on Decision-maker 2’s findings. The choice of alpha level is critical, as it directly influences the conclusion drawn from the study. For instance, if the decision-makers choose an alpha level of 0.05 (corresponding to a 95% confidence level), they would reject the null hypothesis if the probability of obtaining their results by chance is less than 5%. If the probability is greater than 5%, they would fail to reject the null hypothesis. Choosing the appropriate alpha level requires careful consideration. A lower alpha level (e.g., 0.01) makes it harder to reject the null hypothesis, reducing the risk of false positives but potentially missing real effects. A higher alpha level (e.g., 0.10) makes it easier to reject the null hypothesis, potentially detecting more real effects but also increasing the risk of false positives. The significance level helps us decide whether to reject or fail to reject the null hypothesis when the results are not clearly for or against it. By setting a specific threshold for statistical significance, we can make more objective decisions about our hypotheses based on the collected data. This approach provides a systematic method for evaluating hypotheses, allowing decision-makers to draw meaningful conclusions from their data. It also helps to standardise the decision-making process across different studies and fields of research, ensuring consistency and comparability in scientific findings. However, it is crucial to remember that statistical significance does not always imply practical significance, and decision-makers should consider the context and real-world implications of their findings alongside the statistical results. Activity 2: Level of alpha Allow around 10 minutes for this activity Imagine you are working as a marketing analyst for a new product launch, and you want to make sure that your market research is accurate and reliable. You need to determine the appropriate alpha level for your survey results, based on the desired confidence level. Can you determine α? Table 1 Level of alpha

Level of Confidence (C)	α
90%
95%
99%

Table 1 Level of alpha (completed)

Level of Confidence (C)	α
90%	10%
95%	5%
99%	1%

α is calculated as 1 − C. 1 − 90% = 10% 1 − 95% = 5% 1 − 99% = 1% There is an inverse relationship between confidence levels and alpha levels, as increasing the confidence level leads to a decrease in the alpha level, and vice versa.

3 One-tailed vs Two-tailed test Understanding how to conduct hypothesis tests is crucial in statistical analysis. This section explores the concepts of one-tailed and two-tailed tests, which are essential tools in statistical hypothesis testing. The choice between these tests depends on the specific research question and hypothesis under investigation. Decision-makers must carefully consider which type of test to use to ensure a thorough examination of the hypothesis and draw accurate conclusions from the data. To deepen your understanding of these concepts, we will now engage in an activity focused on formulating null and alternative hypotheses. This exercise may present some challenges, but it serves as an excellent foundation for our subsequent discussions. Do not worry if you find it difficult initially – this is a common experience when learning these concepts. Activity 3: Hypotheses setting Allow around 10 minutes for this activity. Read the following statements and then develop a null hypothesis and an alternative hypothesis. “It is believed that OU students need to set aside no longer than, on average, 15 hours to study an entire session of OU module. However, a decision-maker believes that OU students spend longer studying an entire session of the OU module”. You may find this statement different from others you have experienced, so please take a longer time to think about it. H₀: OU students spend, on average, no more than 15 hours studying an entire session of OU course. H_a: OU students spend, on average, more than 15 hours studying an entire session of OU course. They can also be written as: H₀: µ ≤ 15 hours studies H_a: µ > 15 hours studies µ is a symbol for a population mean. Remember, H₀ and H_a are always opposites.

3.1 Non-directional hypotheses Non-directional hypothesesalso known as two-tailed hypotheses, form the basis for open-ended investigations. Decision-makers use these hypotheses, which employ equal (=) or not equal (≠) signs, when they do not predict a specific direction for the relationship or difference under study. This approach proves particularly useful in exploratory research or when examining complex phenomena, allowing decision-makers to study relationships between variables without preconceived notions. Non-directional hypotheses aim to determine whether a statistically significant difference or relationship exists between two or more variables. The term "two-tailed" refers to the statistical test used to evaluate the hypothesis, which considers the possibility that the difference or relationship could occur in either direction. A non-directional hypothesis typically comprises two components: The null hypothesis (H₀): This usually states that there is no difference or relationship between the variables being studied. The alternative hypothesis (H₁): This suggests that there is a significant difference or relationship, but does not specify the direction of this difference or relationship.

3.2 Directional hypotheses Directional hypotheses, also known as one-tailed hypotheses, predict the direction of a relationship or difference between variables. These hypotheses use signs like less than or equal to (≤) and greater than (>) in their statements. For example, a directional hypothesis might propose: “A marketing campaign will increase product sales”. This hypothesis specifies the expected outcome (an increase in sales) before data collection begins. Directional hypotheses offer several advantages in research: They provide more precise and focused predictions than non-directional hypotheses. They are often preferred in scientific research due to their specificity. In business management, they can help design studies to examine the effectiveness of strategies, such as marketing campaigns. To test a directional hypothesis, decision-makers use a one-tailed test. This statistical test aims to determine if the data supports the anticipated direction of the relationship or difference. Let us reconsider the study time example: H₀: µ ≤ 15 hours of studies H₁: µ > 15 hours of studies Here, the null hypothesis (H₀) states that the population mean (µ) is less than or equal to 15 hours of studies. The alternative hypothesis (H₁) predicts that the population mean is greater than 15 hours of studies.

4 Z-test vs t-test Now you will explore two fundamental statistical tests for comparing means: the z-test and the t-test. Understanding these tests is crucial for making accurate comparisons and drawing valid conclusions in various business scenarios. The key factor in choosing between a z-test and a t-test is whether you know the population standard deviation: Z-test: Use when the population standard deviation is known. T-test: Use when the population standard deviation is unknown.

4.1 Z-test: known population standard deviation The z-test is appropriate when you have access to the population standard deviation. This often occurs when you have comprehensive industry reports or extensive historical data. Key characteristics of the z-test include: It uses the known population standard deviation in calculations. The test statistic follows a standard normal distribution (bell-shaped curve). It generally provides more precise results when population parameters are available. Using the z-statistic formula, this can be explained:

Z = \frac{\bar{x} - μ}{(\frac{σ}{\sqrt{n}})}

cap z equals x bar minus mu divided by left parenthesis sigma divided by square root of n right parenthesis

\bar{x}

= sample mean

μ

= population mean under the null hypothesis

σ

= population standard deviation n = sample size This formula reflects the z-test’s characteristics by directly incorporating the known population standard deviation (σ) and transforming the difference between the sample mean and population mean into a standardised score that follows the standard normal distribution. It may seem odd to you that in Unit 1, you learned how to calculate the z-score, which is shown in this formula.

Z = \frac{x - μ}{σ}

Where:

\bar{x}

= raw score

μ

= population mean

σ

= population standard deviation You may question how this relates to the z-statistic we have been discussing. While these concepts are related, they serve different purposes in statistical analysis. The key difference lies in their focus: The z-score focuses on a particular data point within a population. It tells us how many standard deviations an individual value is from the population mean. Imagine you are analysing the performance of a new marketing campaign. A z-score to determine how a specific customer’s spending compares to the average customer spending across all campaigns. The z-statistic, on the other hand, focuses on the sample mean in relation to the population parameters. It tells us how many standard errors the sample mean is from the hypothesised population mean. This is crucial for hypothesis testing and making inferences about populations based on sample data. Using the same marketing campaign example, a z-statistic to test whether the average spending in your new campaign (based on a sample) is significantly different from the average spending (population mean).

4.2 T-test: unknown population standard deviation You will use the t-test when you lack information about the population standard deviation, which is common in many real-world business scenarios. Key characteristics of the t-test include: It estimates the population standard deviation using sample data. The test statistic follows a t-distribution (similar to normal distribution but with heavier tails). It is more versatile for real-world scenarios where population parameters are unknown. The t-statistic formula is typically expressed as: Where:

t = \frac{\bar{x} - μ}{(\frac{S}{\sqrt{n}})}

\bar{x}

= sample mean

μ

= hypothesised population mean s = sample standard deviation n = sample size In the real world, we often cannot measure every single thing in a population. Instead, we take a smaller group (a sample) and use it to make educated guesses about the whole population. The formula uses “s”, which is calculated from our sample, to estimate how spread out the entire population might be. The t-distribution follows a unique pattern that differs from the normal distribution. When you calculate the t-statistic multiple times using various samples, the results conform to this t-distribution. While similar to the bell-shaped curve of the normal distribution, the t-distribution features heavier “tails” or edges. This characteristic accounts for the additional uncertainty inherent in estimating population parameters from sample data. Importantly, the degree of “fatness” in these tails is not constant. It depends on the sample size, a concept we will explore in more detail in the next section. Generally, as the sample size increases, the t-distribution more closely resembles the normal distribution. This relationship between sample size and the shape of the t-distribution plays a crucial role in statistical inference and will influence how you interpret your results in various business scenarios.

Figure 5 Z-distribution vs T-distribution The graph compares the z-distribution and t-distribution, showing their distinct shapes. The z-distribution, represented by the blue curve, has a sharper peak and thinner tails, indicating it is more concentrated around the mean. In contrast, the t-distribution, shown by the orange curve, has thicker tails, reflecting greater variability. T-test works well in real-life situations because this formula does not need us to know exact information about the whole population. It is very useful in real research and studies. Most of the time, we do not know everything about an entire population, so this test helps us make good guesses. We do not need to know how spread out the whole population is. Notice that the formula does not include σ (population standard deviation, you can go back to check the z-statistic formula), which represents the spread between numbers in the population. We often do not know this value in real life. That is why we use ‘s’ from our sample instead. This is a key reason why we choose to use a t-test in many situations.

4.3 Additional consideration Traditionally, decision-makers would choose between z-tests and t-tests based on two main factors: a) Whether the population standard deviation was known (z-test) or unknown (t-test), and b) The size of the sample (when n < 30, they would choose to use t-test). This approach was rooted in the different properties of the normal distribution and t-distribution, particularly for smaller sample sizes. However, modern statistical practice often favors using t-tests regardless of sample size when the population standard deviation is unknown. This shift occurred because as sample sizes increase, the t-distribution closely resembles the normal distribution (z-distribution), as shown in Figure 2. For small samples, the t-test accounts for the extra uncertainty from estimating the population standard deviation. For large samples, it gives nearly identical results to a z-test. This consistent use of t-tests simplifies statistical analysis, eliminating the need to decide when to switch from a t-test to a z-test based on sample size. It also provides a slightly more conservative estimate for smaller samples, reducing the risk of false positive results. With today’s statistical software easily handling t-tests for any sample size, this approach combines mathematical accuracy with practical simplicity across various research scenarios, moving beyond the traditional "sample size of 30" rule.

Figure 6 Z-distribution vs T-distribution with different sample size The graph illustrates the comparison between the z-distribution and two t-distributions with different sample sizes. The blue curve represents the z-distribution (normal distribution), while the orange curve shows the t-distribution with a smaller sample size, and the green curve depicts the t-distribution with a larger sample size. As the graph indicates, the t-distribution with a smaller sample size has the thickest tails, reflecting greater variability. As the sample size increases, the t-distribution (green curve) becomes more similar to the z-distribution, which has the narrowest peak and the thinnest tails, demonstrating how t-distributions converge to the normal distribution as sample size grows.

4.4 Non-directional hypotheses Non-directional hypothesis testing involves testing the null hypothesis that the population mean is equal to a certain value. The alternative hypothesis asserts that the population mean is not equal to that value. As mentioned previously, the test of such hypothesis is two-tailed, in the sense that the null hypothesis gets rejected whenever there is a sizable difference between the sample mean and the hypothesised mean in either direction. But how sizable this difference needs to be in order for us to reject the hypothesis? This is where the concept of critical values comes in. Critical values play a vital role in statistical hypothesis testing and confidence interval construction. A critical value is a threshold that defines the boundary between the fail to reject and rejection regions in a statistical hypothesis test. It marks the point where a test statistic becomes statistically significant, allowing decision-makers to make decisions about rejecting or failing to reject the null hypothesis. There are several types of critical values, depending on the statistical test being performed. For example: Z critical value: Used for z-tests and based on the standard normal distribution. T critical value: Used for t-tests and based on the t-distribution. The two-tailed nature of a hypothesis becomes evident when we consider the rejection region, which includes outcomes from both the upper and lower tails of the sample distribution.

Figure 7 Two-tailed test A symmetrical graph reminiscent of a bell. The graph points out z-score axis and foam height axis, as well as circle the rejection regions of Null Hypothesis on both sides of the bell curve. Hypothesis testing is fundamentally rooted in probability theory. When we conduct a hypothesis test, we essentially ask: "What is the probability of observing our sample results if the null hypothesis were true?" In our coffee foam example, we are examining the probability distribution of sample means. Under the null hypothesis (H₀: μ = 1 cm), we assume that if we were to repeatedly draw samples and calculate their means, these sample means would follow a normal distribution centred around 1 cm. We establish two competing hypotheses: H₀: μ = 1 cm H₁: μ ≠ 1 cm Here, μ represents the true population mean foam height. The null hypothesis assumes no difference from our target value (1 cm), while the alternative hypothesis suggests a significant deviation.

Figure 8 Two-tailed test – Coffee foam a symmetrical graph reminiscent of a bell. The graph points out z-score axis and foam height axis, as well as circle the rejection regions of Null Hypothesis on both sides of the bell curve. Our chosen significance level, α = 0.05, has a direct probabilistic interpretation. It means we are willing to accept a 5% chance of rejecting the null hypothesis when it is actually true. In a two-tailed test, we divide this 5% probability equally between the two tails of the distribution. This results in: 2.5% probability in the left tail (area below z = -1.96) 2.5% probability in the right tail (area above z = +1.96) It is worth noting that the orange area in the figure is equivalent to 5%. The ‘fail to reject region’ needs to amount to the remaining 95%, because probabilites of all possible outcomes sum to 100%. The central 95% of the distribution (between z = -1.96 and z = +1.96) represents the range of sample means that we would consider "not unusual" if the null hypothesis were true. The rejection region in our two-tailed test corresponds to the areas of low probability under the null hypothesis. Specifically: If our test statistic falls in either tail (z < -1.96 or z > 1.96), it means we have observed a result that had less than a 2.5% chance of occurring if H₀ were true. The total probability of falling in the rejection region is 5% (2.5% + 2.5%). Interpreting Results Probabilistically: If we reject H₀: We conclude that we have observed a result that had a less than 5% chance of occurring if the true population mean were 1 cm. This low probability leads us to doubt the null hypothesis. If we fail to reject H₀: We have observed a result that had a greater than 5% chance of occurring under H₀. This does not prove H₀ is true, but suggests we lack strong evidence against it. It is crucial to understand that these probabilities relate to the long-run frequency of results if we were to repeat our sampling process many times. They do not tell us the probability that H₀ is true or false in any single test. We will explore how to calculate the actual probability of our observed results using the z-test later. This will involve determining where our sample mean falls in this probability distribution and calculating the associated p-value.

4.5 Directional hypotheses As you may recall from an earlier discussion, directional hypotheses are refered to as one-tailed hypotheses. These hypotheses use iequality signs in their statements. To test a directional hypothesis, you would be expected to perform a one-tailed test, which aims to verify whether the data from the sample supports the anticipated direction of the relationship or difference. To conduct a one-tailed test, decision-makers establish a critical value to determine whether to reject or retain the null hypothesis. This process typically involves setting a significance level (α). For instance, when α = 0.05, the z critical value for a one-tailed test in a normal distribution is 1.65. You may notice that the z critical value of 1.65 for a one-tailed test with α = 0.05 differs from the 1.96 used in two-tailed tests. This difference arises from the nature of these tests and how they distribute the significance level. In a one-tailed test, we focus on only one direction of the distribution (either upper or lower tail). Consequently, we allocate the entire α (0.05) to one tail of the distribution. To understand this better, consider that if one tail is associated with 0.05, then both tails would be associated with 2 x 0.05 = 0.10, which is equivalent to a 90% confidence level. At a 90% confidence level, the z-score equals 1.65. This relationship explains why we use 1.65 as the z critical value in one-tailed tests with α = 0.05, rather than the 1.96 used in two-tailed tests at the same significance level. In this case: The null hypothesis would be rejected if the z-statistic exceeds 1.65. Only the upper tail region of the distribution is considered for rejection in a one-tailed test. The area in the tail above z = +1.65 represents 0.05 of the distribution. Unlike in a two-tailed test, the alpha level does not need to be divided by two for a one-tailed test. This approach allows decision-makers to focus on detecting effects in a specific direction, making it particularly useful when there’s a strong theoretical or practical reason to expect a particular outcome.

Figure 9 One-tailed test – right (upper) tail. a symmetrical graph reminiscent of a bell. The graph points out z-score axis and hours of study axis, as well as circle the rejection regions of Null Hypothesis on right hand side of the bell curve. In summary, decision-makers use one-tailed tests to evaluate directional hypotheses, which predict the direction of a difference or association between two variables. The critical value for these tests depends on the chosen significance level (α), and the test aims to determine if the data supports the predicted direction. One-tailed tests are versatile and can be used for both "greater than" and "less than" scenarios. Let us consider an example to illustrate this flexibility: Imagine a department store where management believes the average customer spend per visit is £65. However, the service manager suspects customers are spending less. We can formulate the following hypotheses: H₀: µ ≥ £6 H₁: µ < £65 In this case, the null hypothesis (H₀) states that the average spend is greater than or equal to £65, while the alternative hypothesis (H₁) suggests it is less than £65. To test this directional hypothesis, we conduct a one-tailed test. The alternative hypothesis predicts that the true value of µ will be lower than £65, so we focus on the lower tail of the normal distribution for our rejection region. At an alpha level of 0.05, the z critical value for the lower tail is -1.65. This means we would reject the null hypothesis if our calculated z-statstic falls below -1.65. Graphically, this rejection region appears in the lower tail of the normal distribution curve.

Figure 10 One-tailed test – left (lower) tail. a symmetrical graph reminiscent of a bell. The graph points out z-score axis and customer spending axis, as well as circle the rejection regions of Null Hypothesis on left hand side of the bell curve. In general, the one-tailed test is not restricted to a specific direction and can be used in either direction, depending on the research question and the hypothesis being tested. The test is utilised to determine if the data supports a directional hypothesis, and a critical value is established based on the significance level chosen for the test.

5 Z Critical value and z-test We have learned earlier that critical values play a vital role in statistical hypothesis testing and confidence interval construction. A critical value is a threshold that defines the boundary between the “fail to reject” and “rejection” regions in a statistical hypothesis test. It marks the point where a test statistic becomes statistically significant, allowing decision-makers to make decisions about rejecting or failing to reject the null hypothesis. In this section, we need to first learn how to calculate z critical values. For the purpose of this course, we will use Excel to do so. Before we delve into the main question, it is crucial to understand how to use Excel to determine z critical values. This knowledge forms the foundation for conducting statistical tests effectively. Let us explore this concept, focusing on both one-tailed and two-tailed tests.

5.1 One-tailed test In a one-tailed test, we focus on either the upper (right) or lower (left) tail of the distribution. For our example, we will consider an upper-tailed test with a significance level (α) of 5% or 0.05. Excel provides a useful function to find z critical values quickly and accurately for this purpose: the NORM.S.INV() function. Let us consider an upper-tailed test. In this scenario, the z critical value corresponds to the boundary equal to 1 – 5%, or 95% of the z-distribution.

Figure 11 Upper tailed test The image depicts the concept of an upper-tailed test in statistics, commonly used in hypothesis testing. In this test, the area under the curve represents the probability distribution, with the majority (95%) of the distribution being under the main portion of the curve. To use NORM.S.INV(), we input this cooresponding propability that we are interested in. =NORM.S.INV(0.95)

This formula returns a z critical value of approximately 1.645, which we can round to 1.65 (This value has been used in earlier sections). For upper tail (e.g., H1: μ > certain value): The critical value would be positive: +1.65 We would reject the null hypothesis if the calculated z-statistic is greater than 1.65. For a lower-tailed test, we focus on the left side of the distribution. Here, the z critical value corresponds to the boundary equal to 5% of the z-distribution (Figure 5).

Figure 13 Lower Tailed Test This graph is created by the author. The image depicts the concept of an upper-tailed test in statistics, commonly used in hypothesis testing. In this test, the area under the curve represents the probability distribution, with the 5% of the distribution being under the main portion of the curve. To find this z critical value using NORM.S.INV(), we input the probability of 5%. =NORM.S.INV(0.05) This returns approximately -1.645, which we can round to -1.65. Thus, for a lower-tailed test (H1: μ < certain value): The critical value would be negative: -1.65 We would reject the null hypothesis if the calculated z-statistic is less than -1.65. This explanation clearly illustrates how the direction of the alternative hypothesis determines the sign of the critical value in one-tailed tests. It also emphasises that the rejection criterion depends on whether we are testing for values significantly greater than (upper tail) or less than (lower tail) a certain value.

5.2 Two-tailed test For a two-tailed test, we need to adjust our approach slightly. We still use NORM.S.INV(), but we must account for the probability being split between two tails. Let us use the same example (α = 5%). In a two-tailed test, we split our α evenly between the two tails..

S p l i t t e d a = \frac{0.05}{2} = 0.025

Figure 14 Two tailed test The image illustrates a two-tailed test used in hypothesis testing. In this test, both extremes (tails) of the probability distribution are considered. The curve represents the distribution, and the two shaded regions on the left and right represent the lower and upper tails, each accounting for 2.5% of the total area under the curve. The lower tail z-value would correspond to the 2.5% of the distribution, so you would use: =NORM.S.INV(0.025) This gives approximately -1.96. The upper tail z-value would correspond to the 97.5% of the distribution, so you would use: =NORM.S.INV(0.975) This gives us a z critical value of approximately 1.96. Together, the z critical values are: ±1.96 In practice, this means we reject the null hypothesis if our calculated test statistic is either smaller than -1.96 or larger than 1.96. These values create our "rejection region" - the areas in the tails of the distribution where the evidence against the null hypothesis is strong enough to warrant rejection. To interpret your results: If your test statistic falls between -1.96 and 1.96, you fail to reject the null hypothesis. If your test statistic is less than -1.96 or greater than 1.96, you reject the null hypothesis.

5.3 Practical example Now, you know how to calculate Z critical value. Let us do a z-test together. Consider the following question: A digital marketing agency has recently implemented a new advertising campaign aimed at increasing purchases on their e-commerce platform.

The marketing team has gathered data from a sample of 80 customers to assess the campaign’s effectiveness. The marketing executive asserts that the campaign has been successful, claiming that the average number of purchases per day has increased compared to the previous average. They want to test this claim with a 95% confidence level. Download the Excel file to review the data. Excel file: Digital marketing Your task is to conduct a z-test to evaluate the marketing executive’s claim using the provided digital marketing dataset. Let us solve this problem step by step using Excel: Step 1: Find the Population Mean First, we need to calculate the average of monthly purchases. In Excel: Go to the ‘Population’ sheet. Use the formula: =AVERAGE(A2:A1001) where A2:A1001 contains the population data. The result shows the average number of purchases was 50.20. This is our population mean. Step 2: Formulate Hypotheses Based on the marketing executive’s claim, we can formulate our hypotheses: H₀: The digital marketing campaign will produce equal or less than average monthly purchase results (µ ≤ 50.20). H₁: The digital marketing campaign will produce greater than average monthly purchase results (µ > 50.20). Note that we specify the claim we want to prove as an alternative hypothesis. Step 3: Calculate the Z-statistic We use the formula:

Z = \frac{\bar{x} - μ}{(\frac{σ}{\sqrt{n}})}

Where: = sample mean μ = population mean under the null hypothesis σ = population standard deviation n = sample size To find these values: 1. Calculate the sample mean: Go to the ‘Sample’ sheet. Use the formula: =AVERAGE(A2:A81) This gives us = 53.03 2. We already know µ = 50.20 from Step 1. 3. Calculate the population standard deviation: Go back to the ‘Population’ sheet. Use the formula: =STDEV.P(A2:A1001) This gives us σ = 9.81 Important note on standard deviation functions: Excel provides two functions for calculating standard deviation: STDEV.P() and STDEV.S(). Use STDEV.P() when your data represents the entire population. Use STDEV.S() when your data is just a sample of the population. In this case, we use STDEV.P() because we have data for the entire population of purchases. 4. We know n = 80 from the question and from our data set. Now, let us calculate the z-statistic:

Z = \frac{(53.03 - 50.20)}{(\frac{9.81}{\sqrt{80}})} \approx 2.58

Step 4: Determine the Z Critical Value For an upper tailed test, α = 5%, z critical value corresponds to the boundary = 95% of the z-distribution: In Excel, use the formula: =NORM.S.INV(0.95) This gives us a z-critical value of 1.65. Step 5: Make a Decision Compare the calculated z-statistic (2.58) to the critical value (1.65). Since 2.58 > 1.65, we reject the null hypothesis. Figure 7 illustrates these results. You can see that z-statistic is outside of the fail to reject region and within the rejection region (orange area).

Figure 16 Z critical value and z-statistic. The image depicts a bell-shaped curve, representing a standard normal distribution, with areas marked for the ‘fail to reject’ region and the ‘rejection’ region. The curve shows the z critical value at 1.65, which marks the boundary between these regions. A z-statistic of 2.58 is located in the shaded rejection region, indicating that any test statistic beyond this point would lead to rejecting the null hypothesis. Step 6: Interpret the Results There is sufficient statistical evidence to support the marketing executive’s claim that the new advertising campaign has increased the average number of monthly purchases. The sample data suggests that the true population mean of monthly purchases after the campaign is significantly higher than the population average of 50.20, at a 95% confidence level. Activity 4: Z-test Allow around 60 minutes for this activity. Now, let us apply our knowledge of z-tests to a new scenario in the same marketing context. Consider the following question: Following their successful advertising campaign, the marketing team at the digital marketing agency has decided to explore a new strategy. They have chosen to sponsor a major football event, believing this sponsorship will further boost the number of purchases on their e-commerce platform. After the sponsored event concluded, the marketing team collected data from a sample of 98 customers to assess the impact of this new initiative. The marketing executive is now claiming that the sponsorship has decreased the average number of purchases. Download the Excel file to review the data. Excel file: Sport sponsorship Your task is to conduct a z-test to evaluate the marketing executive’s claim about the ineffectiveness of the football event sponsorship. You should use the same data from the previous advertising campaign example for the population mean and standard deviation. Show your calculation in the box below. Let us solve this problem step by step using Excel: Step 1: Find the Population Mean and Standard Deviation We all use the same population data from the previous example: Population mean (µ) = 50.20 Population standard deviation (σ) = 9.81 Step 2: Formulate Hypotheses Based on the marketing executive’s claim, we can formulate our hypotheses: H₀: The sponsorship did not decrease the average number of purchases (µ ≥ 50.20). H₁: The sponsorship decreased the average number of purchases (µ < 50.20). Step 3: Calculate the Z-statistic Where: µ = population mean under the null hypothesis (50.20) σ = population standard deviation (9.81) To calculate the Z-statistic, we need the sample mean (

\bar{X}

) and sample size (n). Sample Size (n): The question states: "After the event, they gather data from a sample of 98 customers" and you can also find it from the dataset. Therefore, n = 98 Sample Mean (

\bar{X}

): To calculate the sample mean: use this Excel formula =AVERAGE(), in the dataset. The result of this calculation gives us.

\bar{X}

We use the formula:

Z = \frac{(48.73 - 50.20)}{(\frac{9.81}{\sqrt{98}})} \approx - 1.48

Step 4: Determine the Z Critical Value For a lower tailed test, α = 5%, z critical value corresponds to the boundary = 5% of the z-distribution: In Excel, use the formula: =NORM.S.INV(0.05) This gives us a z-critical value of -1.645. Step 5: Decision We fail to reject the null hypothesis because -1.48 > -1.645. The calculated z-statistic (-1.48) does not fall in the rejection region (z < -1.645).

Figure 17 3 z critical value and z-statistic The image depicts a bell-shaped curve, representing a standard normal distribution, with areas marked for the ‘fail to reject region’ and the ‘rejection region’. The curve shows the z critical value at 1.65, which marks the boundary between these regions. A z-statistic of −1.48 is located in the shaded rejection region, indicating that any test statistic beyond this point would lead to rejecting the null hypothesis. Step 6: Interpretation There is insufficient statistical evidence to support the marketing executive’s claim that the sponsorship decreased the number of purchases. However, it is important to note that the sample mean is lower than the population average, which aligns with the executive’s concern about the sponsorship’s effectiveness. While we cannot conclude a statistically significant decrease in purchases, the data suggests that the sponsorship may not have had the desired positive impact. The focus on the lower tail of the distribution is appropriate in this scenario, as we are testing for a potential decrease in purchases. This approach aligns with the marketing executive’s skepticism about the sponsorship’s success and allows us to directly evaluate the claim of ineffectiveness.

5.4 Finding <InlineEquation><MathML><math xmlns="http://www.w3.org/1998/Math/MathML"> <mover accent="true"> <mi>x</mi> <mo>¯</mo> </mover> </math></MathML><Alternative>Finding x-bar</Alternative></InlineEquation> Building upon our previous knowledge of z-tests, we will now apply this statistical method to answer a new question in the field of marketing. Consider the following scenario: Historical records show that customers need an average of 10 seconds of exposure to television advertising commercials before being influenced, with a standard deviation of 1.6 seconds. A marketing manager believes it now takes longer to influence customer behaviour. To support this claim, he plans to sample 100 customers. We need to determine how long customers’ exposure needs to be to influence behaviour in order to justify the marketing manager’s claim with 90% confidence. This question is vital because marketing managers are not particularly concerned with abstract statistical concepts like z critical value, rejection region or fail to reject region. Instead, they need practical, actionable insights to guide their advertising strategies. Define the Hypotheses H₀: μ ≤ 10 seconds (The average time to influence customers has not increased) H₁: μ > 10 seconds (The average time to influence customers has increased) Set the Confidence Level and Critical Value Confidence Level: 90% (α = 0.10 for a one-tailed test) Z critical value: using excel function =NORM.S.INV(0.90) = 1.28 Calculate the Sample Mean

Z = \frac{\bar{x} - μ}{(\frac{σ}{\sqrt{n}})}

cap z equals x bar minus mu divided by left parenthesis sigma divided by square root of n right parenthesis

\bar{x}

= sample mean

μ

= population mean

σ

= population standard deviation n = sample size We can input these values into this formula to solve the value of x. Step 1:

1.28 = \frac{\bar{x} - 10}{(\frac{1.6}{\sqrt{100}})}

1.28 equals x bar minus 10 divided by left parenthesis 1.6 divided by square root of 100 right parenthesis Step 2:

1.28 \times \frac{1.6}{\sqrt{100}} = \bar{x} - 10

1.28 times 1.6 divided by square root of 100 equals x bar minus 10 Step 3:

(1.28 \times \frac{1.6}{\sqrt{100}}) + 10 = \bar{x}

left parenthesis 1.28 times 1.6 divided by square root of 100 right parenthesis plus 10 equals x bar Step 4:

\bar{x} = 10.2

x bar equals 10.2 To justify the marketing manager’s claim with 90% confidence, the sample of 100 customers must show an average time to influence customer behaviour of more than 10.2 seconds. In practical terms: If the sample average is 10.2 seconds or less, we do not have sufficient evidence to reject the null hypothesis. This means we cannot conclude that the average time to influence customers has increased. If the sample average exceeds 10.20 seconds, we have statistical evidence at a 90% confidence level to reject the null hypothesis and support the marketing manager’s claim that it now takes longer to influence customer behaviour. The marketing team should conduct their survey of 100 customers and calculate the average time it takes to influence their behaviour. If this average is greater than 10.20 seconds, it provides support for the marketing manager’s assertion. This result suggests that an increase of more than 0.20 seconds in the average time could be statistically significant given the sample size and variance in the data. The marketing team should consider practical significance alongside statistical significance. While an increase to just over 10.20 seconds is statistically significant, they should evaluate whether this small increase has meaningful implications for their advertising strategies. Activity 5: Determine the sample mean to reject H₀ Allow around 40 minutes for this activity. According to store records, customers are persuaded to purchase products with an average price discount of 50% in marketing promotions, with a standard deviation of 5.3%. The market research team believe that the company should offer more than 50% price discounts to motivate customers’ purchase intentions. To test this claim, the team plan to survey 1,000 customers. We need to determine the decision rule at a 99% confidence level to accept the market research team’s claim. Use the textbox below to show your calculations. Step 1:Define the Hypotheses H₀: μ ≤ 50% (Customers are persuaded by price discounts of 50% or less) H₁: μ > 50% (Customers require price discounts greater than 50% to be persuaded) Step 2: Identify the z-score and rejection region. For a one-tailed test at 99% confidence level (α = 0.01), using the Excel function =NORM.S.INV(0.99), we get a z critical value of 2.326 ≈ 2.33 Step 3: Identify all the values.

μ = 0.50

(expressed as a decimal)

σ = 0.053

(5.3% expressed as a decimal)

n = 1,000

z = 2.33

(approximate z critcail value for 99% confidence level) Step 4: Calculate the Sample Mean

2.33 = \frac{\bar{x} - 0.5}{(\frac{0.053}{\sqrt{1000}})}

2.33 equals x bar minus 0.5 divided by left parenthesis 0.053 divided by square root of 1000 right parenthesis Then

\bar{x} \approx 0.504 = 50.4 %

Step 5: State the Decision Rule The market research team should reject the null hypothesis (and accept their claim that customers require more than 50% discount) if the survey of 1,000 customers shows that, on average, customers are persuaded to purchase products when the price discount is greater than 50.4%.

6 P-value In our previous discussion, we examined a case study involving a digital marketing agency’s new advertising campaign. We used a z-test to evaluate the claim that the campaign has not increased the average number of purchases per day. Let us recap our findings and delve deeper into their interpretation, focusing on the crucial concept of p-values and their role in statistical analysis. We calculated a z-statistic of 2.58, which exceeded our z-critical value of 1.65 (at a 95% confidence level). This led us to reject the null hypothesis, concluding that there was sufficient evidence to support the marketing executive’s claim of increased purchases. This is illustrated in the figure below.

Figure 18 Z critical value and z-statistic The image depicts a bell-shaped curve, representing a standard normal distribution, with areas marked for the ‘fail to reject’ region and the ‘rejection’ region. The curve shows the z critical value at 1.65, which marks the boundary between these regions. A z-statistic of 2.58 is located in the shaded rejection region, indicating that any test statistic beyond this point would lead to rejecting the null hypothesis. Nevertheless, the z-statistic of 2.58 provides more information than simply allowing us to make a binary decision about rejecting or failing to reject the null hypothesis. It tells us how many standard deviations away from the mean our sample result lies, assuming the null hypothesis is true. While the z-critical value approach is useful, it does not tell us the exact probability of obtaining our result if the null hypothesis were true. This is where the concept of p-values becomes invaluable. The p-value is a fundamental concept in statistical analysis used to quantify the statistical significance of observed results. It represents the probability of obtaining an effect equal to or more extreme than the one observed, assuming that the null hypothesis is true. In our context, the p-value represents the probability of obtaining a test statistic as extreme as, or more extreme than, our observed z-statistic of 2.58, assuming the null hypothesis (the advertising campaign had no or negative effect) is true. Key Aspects of P-Values: 1. Null Hypothesis: The p-value is based on two hypotheses: H₀: Typically assumes no difference or effect. In our case, H₀ stated that the digital marketing campaign would produce equal or less than average monthly purchase results (µ ≤ 50.20). H₁: Assumes the null hypothesis is untrue. Our H₁ stated that the campaign would produce greater than average monthly purchase results (µ > 50.20). 2. Significance Level: Often denoted as alpha (α), the significance level is the threshold below which a p-value is considered statistically significant. In our original analysis, we used a 95% confidence level, which corresponds to a significance level (α) of 5% or 0.05. 3. Interpretation: A smaller p-value suggests stronger evidence against the null hypothesis. However, it is crucial to note that the p-value does not indicate the size or importance of an observed effect.

6. 1 Calculating the p-value To find the p-value corresponding to the z-statistic, we use Excel’s NORM.S.DIST(z, cumulative) function. The calculation method depends on the specific null hypothesis you are testing. Let us explore this in detail: NORM.S.DIST(z, cumulative) function: Where: "z" is the calculated z-statistic cumulative" is a logical value: TRUE returns the cumulative distribution function FALSE returns the probability density function For hypothesis testing, we typically use the cumulative distribution function (TRUE). Calculating p-values based on the null hypothesis: 1. For a two-tailed test: First, take the absolute value of your z-statistic, which is denoted as |z|. p-value = 2 * (1 - NORM.S.DIST(|z|, TRUE)) We use the absolute value in two-tailed tests because we are interested in the magnitude of the deviation from the null hypothesis, regardless of its direction. This approach leverages the symmetry of the normal distribution and simplifies our calculations. 2. For a one-tailed test (upper tail): sample mean greater than or equal to hypothesised value p-value = 1 - NORM.S.DIST(z, TRUE) 3. For a one-tailed test (lower tail): sample mean less than or equal to hypothesised value p-value = NORM.S.DIST(z, TRUE) This distinction is crucial because the direction of your hypothesis determines how you interpret the area under the normal distribution curve. By correctly aligning your p-value calculation with your null hypothesis, you ensure accurate interpretation of your statistical results. Explanation in terms of probability: Two-tailed test: The p-value represents the probability of observing a test statistic as extreme as, or more extreme than, the calculated z-statistic in either direction, assuming the null hypothesis is true. We multiply by 2 to account for both tails of the distribution. Upper-tail test: The p-value represents the probability of observing a test statistic as extreme as, or more extreme than, the calculated z-statistic in the upper tail, assuming the null hypothesis is true. Lower-tail test: The p-value represents the probability of observing a test statistic as extreme as, or more extreme than, the calculated z-statistic in the lower tail, assuming the null hypothesis is true. Remember, the z-statistic represents how many standard deviations your sample mean is from the population mean under the null hypothesis. The NORM.S.DIST() function then translates this z-statistic into a probability, allowing you to make informed decisions about rejecting or failing to reject your null hypothesis in various business scenarios. In our marketing campaign (upper-tail test) example, we calculated a z-statistic of 2.58. To find the probability represented by this z-statistic, we use: =NORM.S.DIST(2.58, TRUE)

This returns approximately 0.9951, or 99.51%. Under the null hypothesis (assuming the marketing campaign had no effect), there is a 99.51% chance that we would observe a z-statistic less than or equal to 2.58. The p-value represents the probability of obtaining a result as extreme as, or more extreme than, the observed result, assuming the null hypothesis is true. In our one-tailed test looking for an increase, “more extreme” means values greater than our observed z-statistic. To calculate the p-value, we find the area in the tail beyond our z-statistic: p-value = 1 - 0.9951 = 0.0049 or 0.49% To visualise this (Figure 15): The area to the left of z = 2.58 under the standard normal curve (approximately 99.51% of the total area). The p-value gives us the remaining area to the right of z = 2.58 (approximately 0.49% of the total area).

Figure 20 P-value The image shows a bell curve representing a standard normal distribution, with a z-statistic of 2.58 marked on the far right side of the curve. The shaded area under the curve, covering 99.51% of the total area, represents the cumulative probability up to the z-value of 2.58. The remaining 0.49% of the area, located beyond the z-statistic, corresponds to the p-value. For our marketing campaign example, this means: Under the null hypothesis (assuming the campaign had no effect), there is a 99.51% chance that we would observe a z-statistic less than or equal to 2.58. There is only a 0.49% chance (our p-value) of observing a z-statistic of 2.58 or greater if the null hypothesis were true. The p-value is smaller than the conventional significance level of 5%. This provides strong evidence against the null hypothesis, supporting our decision to reject it and conclude that the marketing campaign likely had a significant positive effect on purchases.

7 Hypothesis testing for population proportions In previous sessions, we focused on problems involving population and sample means. However, the z-test has broader applications, including its use in solving problems related to population proportions. Proportions, also known as relative frequencies, represent the fraction of items in a specific group or category within a larger sample. We calculate proportions by dividing the number of items in a particular group by the total number of items in the sample. This calculation provides a representation of the quantity of items belonging to a specific category, typically expressed as a fraction or percentage.

In the realm of statistical and data analysis, proportions serve several important functions: Characterising variable distributions: Proportions help us understand how different categories or values are distributed within a dataset. Contrasting distinct groups: By comparing proportions, we can identify differences or similarities between various categories or subsets within a sample. Summarising categorical data: Proportions offer a concise way to present information about categorical variables, making it easier to grasp the composition of a dataset. Hypothesis testing: We can use proportions to test hypotheses about population parameters, similar to how we use means in other statistical tests. The application of proportions in statistical analysis allows decision-makers to draw meaningful conclusions about population characteristics based on sample data. This approach proves particularly useful when dealing with categorical variables or when we need to understand the relative occurrence of specific attributes within a population. The z-test for proportions, specifically, allows us to make inferences about population proportions based on sample data. This test helps determine whether an observed sample proportion differs significantly from a hypothesised population proportion. 7.1 Practical example To illustrate the concept of proportions, let us revisit our digital marketing dataset containing 80 customers in a sample. In addition to the monthly purchases data, we also have a column labelled "big customer". This column identifies customers who have made over 55 purchases per month. In this sample, we find that 32 customers are identified as big customers. Using this dataset, we can demonstrate how to calculate and interpret proportions: Identify the specific group: Our group of interest is “big customers”. Count the number of items in the group: We have 32 customers classified as “big customers”. Calculate the proportion: We divide the number of big customers by the total sample size. Proportion =

\frac{32}{80}

= 0.4 or 40% This proportion tells us that 40% of the customers in our sample are considered "big customers" based on our definition. We can use this proportion for various analytical purposes: Describing the customer base: We can state that two-fifths of our sampled customers are high-volume purchasers, making over 55 purchases per month. Segmentation analysis: This proportion could inform marketing strategies, helping to tailor approaches for different customer segments. With 40% being big customers, we might consider developing specific retention strategies for this significant group. Trend analysis: By calculating this proportion over time, we can track changes in the composition of our customer base. An increase in this proportion might indicate successful upselling strategies, while a decrease could signal a need for customer retention efforts. Hypothesis testing: We might want to test whether the true population proportion of big customers differs from a hypothesised value. In this case, we would use a z-test for proportions. Here, we are particularly interested in hypothesis testing. For example, a marketing executive reviews this data and formulates a claim that the percentage of big customers in the entire population is more than 30%. We want to test this claim using a hypothesis test. Question: At a 95% confidence level, is there enough evidence to support the marketing executive’s claim that the population proportion of big customers is more than 30%? We will use the following formula to test proportions:

Z = \frac{\hat{p} {- p}_{o}}{\sqrt{\frac{p_{o} (1 - p_{o})}{n}}}

p̂ = sample proportion p₀ = population proportion under the null hypothesis n = sample size Step 1: Formulate the hypotheses H₀: p ≤ 0.30 (population proportion is 30% or less) H₁: p > 0.30 (population proportion is more than 30%) Step 2: Determine the z-statistic Given information: n = 80 customers p̂ = 0.4 p₀ = 0.3 We use the z-test for proportions. The formula is:

Z = \frac{0.4 - 0.3}{\sqrt{\frac{0.3 (1 - 0.3)}{80}}} = 1.952

Step 3: Determine the critical value For an upper tailed test, at α = 5%, we use the Excel function: =NORM.S.INV(0.95) The z critical value is 1.645. Step 4: Compare the z-statistic to the critical value and make a decision. Our calculated z-statistic (1.952) is greater than the critical value (1.645). Since the calculated z-statistic exceeds the critical value, we reject the null hypothesis. Conclusion: At a 95% confidence level, there is enough evidence to support the marketing executive’s claim that the population proportion of big customers is more than 30%. The sample data provides sufficient evidence to conclude that the true population proportion is significantly higher than 30%. Activity 6: Testing Proportion Hypothesis Allow around 40 minutes for this activity. A cinema chain has records indicating that 70% of customers are willing to spend £15 on a 3D movie ticket. A ticket agent believes that this percentage has changed. To test this belief, the agent surveys 200 individuals and finds that 128 of them are willing to spend £15 on a 3D movie ticket. At the 95% confidence level, is there enough evidence to support the agent’s belief that the proportion has changed? Given information: Sample size (n) = 200 individuals Number willing to spend £15 = 128 Sample proportion (p̂) = 128/200 = 0.64 or 64% Claimed population proportion (p₀) = 0.70 or 70% Confidence level = 95% (significance level α = 0.05) Step 1: Formulate the hypotheses H₀: p = 0.70 (population proportion is 70%) H₁: p ≠ 0.70 (population proportion is not 70%) Step 2: Determine the z-statistic We use the z-test for proportions. The formula is:

Z = \frac{0.64 - 0.70}{\sqrt{\frac{0.70 (1 - 0.70)}{200}}} = 1.852

Step 3: Determine the critical value For a two-tailed test, at α = 5%, use the Excel function: Upper tail = NORM.S.INV(0.975) Lower tail = NORM.S.INV(0.025) The z critical value is ±1.96. Step 4: Compare the z-statistic to the critical value and make a decision Our calculated z-statistic (-1.852) is greater than -1.96 and less than 1.96. Since the calculated z-statistic does not exceed the critical values (±1.96), we fail to reject the null hypothesis. At a 95% confidence level, there is not enough evidence to support the ticket agent’s belief that the proportion of customers willing to spend £15 on a 3D movie ticket has changed from 70%. The sample data does not provide sufficient evidence to conclude that the true population proportion is significantly different from 70%. 8 T-test In previous discussions, we examined hypothesis testing using z-tests, which we apply when we know the population’s standard deviation. However, real-world scenarios often present a different challenge: the population’s standard deviation is frequently unknown. This common occurrence requires us to adjust our approach. When decision-makers encounter an unknown population standard deviation, they estimate this crucial parameter using the sample’s standard deviation. This estimation process introduces additional uncertainty into our calculations, necessitating the use of a different statistical tool: the t-test.

8.1 One sample t-tests: comparing a sample mean against the population mean The one-sample t-test operates similarly to the z-test, with the primary difference lying in how we determine the critical value. As you may recall from the earlier section, the shape of the t-distribution depends on the sample size, which slightly alters our calculated results of t critical value).

Figure 23 Z-distribution vs t-distribution with different sample size The graph illustrates the comparison between the z-distribution and two t-distributions with different sample sizes. The blue curve represents the z-distribution (normal distribution), while the orange curve shows the t-distribution with a smaller sample size, and the green curve depicts the t-distribution with a larger sample size. As the graph indicates, the t-distribution with a smaller sample size has the thickest tails, reflecting greater variability. As the sample size increases, the t-distribution (green curve) becomes more similar to the z-distribution, which has the narrowest peak and the thinnest tails, demonstrating how t-distributions converge to the normal distribution as sample size grows. To fully comprehend this difference, we need to understand the concept of degrees of freedom (DF). DF represent the amount of independent information available for estimating statistical parameters. More degrees of freedom generally lead to more reliable estimates. For a one-sample t-test, we calculate degrees of freedom as the sample size minus one.The formula is: DF = n - 1 Where: n represents the sample size We subtract 1 because we estimate one parameter (the population mean) in a one-sample t-test. This reduction reflects the "cost" of estimating the unknown parameter. Larger sample sizes result in more degrees of freedom. This increase in DF improves the precision of our estimates and enhances the power of our statistical tests. The t-statistic formula is typically expressed as:

Z = \frac{\bar{x} - μ}{(\frac{σ}{\sqrt{n}})}

cap z equals x bar minus mu divided by left parenthesis sigma divided by square root of n right parenthesis

\bar{x}

= sample mean μ = hypothesised population mean s = sample standard deviation n = sample size To perform a one-sample t-test, we must calcuate t-statistic and then determine the t critical value. Unlike z critical values, which use a single function, Excel provides specific functions for calculating t critical values for both one-tailed and two-tailed tests. However, we can modify these functions to calculate values for either type of test. 8.1.1 Performing a One-Tailed One-Sample T-Test For a one-tailed one-sample t-test, we use the T.INV function in Excel. The syntax is: =T.INV(probability, degrees_freedom)

Where: probability: This value represents the cumulative probability for which you want to find the corresponding t-value. It determines the boundary of the t-distribution. degrees_freedom: This equals N-1, where N is the sample size. It defines the specific shape of the t-distribution. Using our previous example of a two-tailed one-sample t-test with α = 5% and 10 degrees of freedom, we would enter: =T.INV.2T(0.05, 10) which returns 2.228 Remember that the T.INV.2T function returns the two-tailed t critical value, which is the absolute value for the positive side of the t-distribution. For a two-tailed test, the t-distribution has both positive and negative critical values. Therefore, when reporting the t critical value, it is necessary to use the ± sign to show both the upper and lower bounds of the critical region. Thus, the t critical value for two tailed test = ± 2.228 This boundary is crucial because: If your calculated t-statistic falls beyond this boundary, you reject the null hypothesis. If your t-statistic falls within these boundaries, you fail to reject the null hypothesis. We can still use T.INV(probability, degrees_freedom) to find t critical value for two-tailed one-sample t-test. You need to modify the probability by dividing the desired significance level by 2 to account for both tails. For instance, for a two-tailed test with α = 5%: =T.INV(0.025, 10) for lower tail =T.INV(0.975, 10) for upper tail 8.1.3 Practical Example Let us explore the application of a one-sample t-test in a practical business scenario. Consider our previous digital marketing example, where a company’s marketing team has implemented an advertising campaign aimed at increasing customer purchases beyond 50 purchases per customer. To assess the campaign’s effectiveness, they have collected data from a sample of 80 customers. Download the Excel file to review the data. Excel file: Digital marketing (You may have already downloaded this file when working through Section 5.3.) The marketing executive believes the campaign has been successful, hypothesising that the average number of purchases per day has increased compared to the pre-campaign average. They want to test this claim rigorously, using a 95% confidence level. This scenario presents an ideal opportunity to apply the one-sample t-test, a statistical tool that allows us to determine whether a sample mean significantly differs from a known or hypothesised population mean. Step 1: Formulate the Hypotheses H₀: μ ≤ 50 H₁: μ > 50 This formulation represents a one-tailed test, as we are specifically testing for an increase in purchases. The value 50 serves as our benchmark, possibly representing the pre-campaign average or a target set by the marketing team. Step 2: Collect and Analyse Sample Data In our digital marketing campaign example, we have gathered the following data: n = 80 x̄ = 53.02 s = 10.22 (we are dealing with sample now, so you should use STDEV.S() function for finding standard deviation) Step 3: Calculate the T-Statistic To quantify the difference between our sample mean and the hypothesised population mean, we calculate the t-statistic using the formula:

t = \frac{\bar{x} - μ}{(\frac{S}{\sqrt{n}})} = \frac{53.025 - 50}{(\frac{10.22}{\sqrt{80}})} = 2.647

Step 4: Determine t critical value: For an upper tailed test with α = 0.05 and 79 degrees of freedom (n - 1), we input: =T.INV(0.95,79) The formula will return the t critical value, which in this case would be approximately 1.664. Step 5: Compare the T-Statistic to T Critical Value We now compare our calculated t-statistic to the critical value: Calculated t-statistic: 2.647 T Critical value: 1.646 Since our calculated t-statistic (2.647) is greater than the critical t-value (1.646), we reject the null hypothesis. Step 6: Draw Conclusions and Interpret Results Based on our analysis, we have strong statistical evidence to conclude that the true average number of customer purchases after the marketing campaign is significantly higher than 50.

8.2 P-value We can also find the p-value for t-test using Excel functions. For t-test, the Excel function is more complex than z-test. We have three different functions. Upper Tailed (One-Tailed) t-Test: Formula: =T.DIST.RT(x, deg_freedom)

Where: You input the calcuated t-statistic as “x” deg_freedom is the number of degree of freedom This function is used when we are testing if a sample mean is significantly greater than a hypothesised value. In our marketing campaign case, we found that our t-statistic = 2.643. We can use this function to find p-value: =T.DIST.RT(2.643, 79) This yields a p-value of approximately 0.005, indicating strong evidence that the campaign increased purchases. After exploring how you can calculate p-values for upper-tailed t-tests, it is important to understand that Microsoft Excel also offers a specific function for determining p-values in lower-tailed t-tests. You will find this function particularly valuable when you need to assess whether a sample mean is significantly less than a hypothesised value. Excel’s built-in capabilities will streamline your process, allowing you to perform efficient and accurate statistical analysis without resorting to manual calculations. You will encounter lower-tailed t-tests in various research scenarios where you focus on determining if a sample statistic is significantly lower than a population parameter. For example, in quality control, you might test whether the mean weight of a product is significantly less than the specified standard. In medical research, you could investigate whether a new treatment reduces symptoms significantly more than an existing treatment. Lower-Tailed (One-Tailed) t-Test: Formula: =T.DIST(x, deg_freedom, cumulative)

Where: You input the calcuated t-statistic as “x” deg_freedom is the number of degree of freedom cumulative is set to “TRUE” This function is used when we are testing if a sample mean is significantly less than a hypothesised value. A negative t-statistic is used because we are looking at the lower tail of the distribution. Having explored both upper-tailed and lower-tailed t-tests, you should now turn your attention to two-tailed t-tests. Microsoft Excel continues to support your statistical analysis needs by offering a function that enables you to calculate p-values for two-tailed t-tests efficiently. This type of test broadens your analytical capabilities, as you will find it particularly useful when you need to determine whether a sample mean differs significantly from a hypothesised value in either direction, without specifying a particular tail of the distribution. Two-Tailed t-Test: Formula: =T.DIST.2T(x, deg_freedom) Where: You input the absolute value of calcuated t-statistic as “x” deg_freedom is the number degree of freedom

This function is used when we are testing if a sample mean is significantly different (either higher or lower) from a hypothesised value. We use the absolute value of the t-statistic because we are considering both tails of the distribution. Activity 7: One sample t-test Allow around 60 minutes for this activity. A marketing manager wants to assess whether a recent promotional campaign has increased the average sales per customer. Before the campaign, the average sales per customer were believed to be 35 units. The manager wants to test this claim rigorously at a 95% confidence level based on the sample data of 51 customers. Download the Excel file to review the data. Excel file: Sales dataset Tip: The Excel function "Descriptive Statistics" provides all the information you need to calculate the t-statistic. You can access via Data > Data Analysis > Descriptive Statistics. Type B2:B52 in the Input Range box Tick "Summary Statistics" Click OK What you will get Excel will show you important numbers including: Samplle mean (x̄) Sample standard deviation (s) Count (sample size - n) Step 1: Formulate Hypotheses H₀: The average sales per customer is less than or equal to 35 units (H₀: μ ≤ 35). H₁: The average sales per customer has increased (H₁: μ > 35). Step 2: Obtain all the information needed to calculate t-statistics Sample size (n) = 51 Sample mean (x̄) = 38.235 Sample standard deviation (s) = 11.728 Step 3: Calculate the T-Statistic

t = \frac{\bar{x} - μ}{(\frac{S}{\sqrt{n}})} = \frac{38.235 - 35}{(\frac{11.728}{\sqrt{51}})} = 1.970

Step 4: Determine t-crtical value There are 51 customers in the sample. So the degree of freedom = 51 -1 = 50 For an upper tailed test with α = 0.05 and 50 degrees of freedom, we input: =T.INV(0.95,50) The formula will return the t critical value, which in this case would be approximately 1.676. Step 5: Make the Decision t-Statistic: 1.970 t-Critical Value: 1.676 Since the calculated t-statistic (1.970) is greater than the critical t-value (1.676), we reject the null hypothesis (H₀). Moreover, you calculate the p-value as well. Since this is a one-tailed (upper-tailed) test, use the T.DIST.RT function in Excel to calculate the p-value. =T.DIST.RT(1.970, 50) Excel will return the p-value approximately 0.0277 This p-value is less than the significance level of 0.05, confirming the rejection of the null hypothesis.

9 Summary In this course, you have grasped the fundamental principles of formulating a hypothesis, learning to distinguish between one-tailed and two-tailed tests. These skills enable you to draw meaningful conclusions from data across various business scenarios. You also explored z-tests and t-tests, two fundamental statistical tools essential for data analysis in business contexts. You learned the characteristics and applications of these tests, understanding when to use each based on factors like sample size, population parameters, and data nature. Through practical examples and exercises, you applied these methods to real-world business problems, reinforcing your understanding and building confidence in your analytical skills. You practised formulating hypotheses, calculating test statistics, and interpreting p-values to draw meaningful conclusions from data. By mastering these techniques, you have developed the ability to critically assess data-driven claims with confidence, a crucial skillset in the increasingly data-driven landscape of business and management. You also gained insights into the limitations and assumptions of these tests, enabling you to approach statistical analysis with a critical eye and avoid common pitfalls in data interpretation. 10 Reflection Now that you have reached the end of this course, complete the following reflection activity to check your progress against the key learning points. Activity 8: Reflection Allow around 15 minutes for this activity. This activity will help you to: Summarise your learning Reflect on the extent to which you have achieved key learning points Think about what you still need to do to fully achieve the key learning points Part A: In the table below, indicate whether you feel very confident (3), confident (2) or not so confident (1) about having achieved each of the key learning points for this course. Table 1

After working through this course, you should be able to:	At the end of this course, you feel:
Differentiate between null and alternative hypotheses for evaluations
Understand one-tailed or two-tailed tests in hypothesis testing
Understand when to use z-tests vs. t-tests
Perform z-tests for comparing means and proportions
Applying different t-tests based on sample characteristics
Evaluate critical values for hypothesis rejection

Part B: For each of the key learning points, write a few sentences in the spaces below summarising what you have learned, what you still feel you need to improve and how you might do that. This activity can identify areas for revision. Differentiate between null and alternative hypotheses for evaluations. Understand one-tailed or two-tailed tests in hypothesis testing. Understand when to use z-tests vs. t-tests. Perform z-tests for comparing means and proportions. Differentiate between null and alternative hypotheses for evaluations. Evaluate critical values for hypothesis rejection. 11 Conclusion The purpose of this course was to discuss hypotheses testing. Through the activities, you have gained a better understanding of the concept of alpha (α). You have learned the difference between a one-tailed test and a two-tailed test. Additionally, you have learned how to calculate z-scores and p-values as well as how to use them to determine whether null hypotheses should be accepted or rejected. Finally, the end of this course helped you gain an understanding of how to conduct hypothesis testing for population proportions. A second OpenLearn course on data analysis, Data analysis: visualisations in Excel, is now also available should you wish to take your studies further. This OpenLearn course is an adapted extract from the Open University course B126 Business data analytics and decision making. References Warner, R. M. (2021) Applied statistics: From bivariate through multivariate techniques. 3rd edn. Thousand Oaks, CA: Sage Publications. Acknowledgements This free course was written by the B126 Open University course team. Except for third party materials and otherwise stated (see terms and conditions), this content is made available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 Licence. The material acknowledged below is Proprietary and used under licence (not subject to Creative Commons Licence). Grateful acknowledgement is made to the following sources for permission to reproduce material in this free course: Course image: Image by congerdesign from Pixabay Figure 1: Dominique Deckmyn / www.cartoonstock.com Figure 2: Image by Jan Vaek from Pixabay Figure 3: Image by WikiImages from Pixabay Figure 4: Image by Tesa Robbins from Pixabay Figure 5: Mike Seddon / www.cartoonstock.com Figure 21: Tom Fishburne / Marketoonist.com Figure 29: Tom Fishburne / Marketoonist.com Every effort has been made to contact copyright owners. If any have been inadvertently overlooked, the publishers will be pleased to make the necessary arrangements at the first opportunity. Don't miss out If reading this text has inspired you to learn more, you may be interested in joining the millions of people who discover our free learning resources and qualifications by visiting The Open University – www.open.edu/openlearn/free-courses.

Discussion 2025021001