4.2 T-test: unknown population standard deviation
You will use the t-test when you lack information about the population standard deviation, which is common in many real-world business scenarios. Key characteristics of the t-test include:
- It estimates the population standard deviation using sample data.
- The test statistic follows a t-distribution (similar to normal distribution but with heavier tails).
- It is more versatile for real-world scenarios where population parameters are unknown.
The t-statistic formula is typically expressed as:
Where:
= sample mean
= hypothesised population mean
s = sample standard deviation
n = sample size
In the real world, we often cannot measure every single thing in a population. Instead, we take a smaller group (a sample) and use it to make educated guesses about the whole population. The formula uses “s”, which is calculated from our sample, to estimate how spread out the entire population might be.
The t-distribution follows a unique pattern that differs from the normal distribution. When you calculate the t-statistic multiple times using various samples, the results conform to this t-distribution. While similar to the bell-shaped curve of the normal distribution, the t-distribution features heavier “tails” or edges. This characteristic accounts for the additional uncertainty inherent in estimating population parameters from sample data.
Importantly, the degree of “fatness” in these tails is not constant. It depends on the sample size, a concept we will explore in more detail in the next section. Generally, as the sample size increases, the t-distribution more closely resembles the normal distribution. This relationship between sample size and the shape of the t-distribution plays a crucial role in statistical inference and will influence how you interpret your results in various business scenarios.
T-test works well in real-life situations because this formula does not need us to know exact information about the whole population. It is very useful in real research and studies. Most of the time, we do not know everything about an entire population, so this test helps us make good guesses.
We do not need to know how spread out the whole population is. Notice that the formula does not include σ (population standard deviation, you can go back to check the z-statistic formula), which represents the spread between numbers in the population. We often do not know this value in real life. That is why we use ‘s’ from our sample instead. This is a key reason why we choose to use a t-test in many situations.