Skip to content
Science, Maths & Technology

More confusing terms in statistics

Updated Thursday, 11th February 2016

Kevin McConway explains why statisticians use everyday words for not-so-everyday concepts - and how to translate what they really mean.

This page was published over five years ago. Please be aware that due to the passage of time, the information provided on this page may be out of date or otherwise inaccurate, and any views or opinions expressed may no longer be relevant. Some technical elements such as audio-visual and interactive media may no longer work. For more detail, see our Archive and Deletion Policy

Bar graph Copyrighted  image Icon Copyright: Sunnycatty | Like most technical subjects, statistics can sometimes use jargon that is pretty incomprehensible to outsiders. But statistics also has an unhelpful tendency to use words that have a well-known everyday meaning, but then give them special meanings that differ from their everyday usage. This can get pretty confusing, whether you’re just beginning to study the subject, or if, for example, you don’t have experience of statistics but have to deal with something written by people who do understand the jargon.

I’ve written before on OpenLearn about this confusing habit of statisticians. In that article I concentrated on two words with enormous potential for being misunderstood, significant and reliable.

Is "significance" actually so significant?

But no apologies for returning (briefly) to significant and significance. Phrases like “this difference is statistically significant” crop up in all sort of places, end even worse, sometimes “statistically” gets missed off, so it’s not clear that the technical meaning is intended rather than the everyday meaning, even if you know the difference. “Significant” means “Important in some way”, right? Well, not necessarily in the way you’d think. If a difference or an effect is statistically significant, that means that it may well not be due simply to chance or random variability. However, the difference may still not be large enough to be of any interest beyond the statistical.

Recent developments have made this potential confusion more important. The availability of very large quantities of data is much greater than it once was. Other things being equal, the larger a data set is, the smaller a difference or effect can be and still be large enough to be declared statistically significant. That makes intuitive sense – the more data you have, the more information (usually) that data can give you, so that it will be easier to distinguish what is and what isn’t chance variability. So in a very big set of data, a very small difference between two groups of people, that is too small to be of real interest to anyone, may still be big enough that it can’t be attributed just to random chance. In very big data sets, it can be terribly easy to find statistically significant effects that are of no practical or useful significance.

Dilbert cartoon strip about statistics Copyrighted  image Icon Copyright: Scott Adams/Universal Uclick, via educational usage

A further point that you may have heard about, is that the whole notion of statistical significance - and the procedures called significance tests that are used to determine it - have come under attack as being an inappropriate way of judging whether anything more than chance is involved. But that doesn’t detract from my main point. Actually, significance testing has been controversial to varying degrees ever since it was first invented – but whatever method is used to assess whether a difference is too big to be attributed to random variability alone – whether it is statistically significant or not - all statisticians would agree that not all differences that go beyond pure chance are really big enough to be practically significant. Statistics can help you in deciding what’s practically important, but it can’t make that decision for you on its own.

There’s rather more detail from me on statistical significance in my previous OpenLearn article, and also (in the context of extra-sensory perception) in this article for the Understanding Uncertainty blog. I also previously discussed “reliability” – but there are many other potentially problematic words to explore that mean something a bit different to statisticians than they do to everyone else. I’ve commented on a few of them in the table below.

More confusing statistical terms - and what they really mean


Statistical meaning


This has a few statistical meanings, but most commonly it means an observation where you don’t know the exact value but you know it’s larger than some given value. This arises, for instance, in studies where you are following people’s progress over time and seeing how long it is till something happens (e.g. they die, or they are cured of some disease). At some point the study has to stop, and for anyone that is still alive (or not cured) at that point, you won’t know exactly how long they will live or how long it will be till they are cured. Even when the study is still running, some people will be lost to follow-up (in the jargon) for example because they have moved away. For the people who are still alive, or haven’t been cured yet or whatever, when you stop observing them, you know that they must have survived at least as long as the length of time you’ve been observing them, but you don’t know how much longer. The resulting observation is said to be censored.


A particular measure of how well some data fits to a statistical model.


The difference between an observation (i.e. a data value) and the mean (or alternatively the difference between an observation and some value given by a statistical model). Probably most familiar as ‘standard deviation’, a particular kind of average deviation.


A sort of deviation (see above), really. The difference between an observed value and its corresponding theoretical value (or to be more precise, usually its expected value, see below). So it doesn’t mean ‘a mistake’. In pretty well any statistical analysis, there will be an error associated with each observation, and that doesn’t mean the experimenters or statisticians messed up, just that there is some random variability from the theory or expected results.


The expected value of a random quantity is a sort of average of the possible values it can take, allowing for the different probabilities of those values. So it’s not necessarily the value you’d expect the quantity to take. For instance if you toss three fair coins, the expected (average) number of heads is one and a half, but you wouldn’t actually expect to see one and a half heads when flipping three coins, because clearly that’s impossible.


Usually used (in statistics) in phrases like ‘hazard function’ or ‘hazard rate’ or ‘hazard ratio’. The hazard rate is the instantaneous rate at which something happens, and the ‘something’ isn’t always a bad thing. (It might be being cured of a disease, for instance.)


An improper probability distribution is one that says how relatively likely different numerical values are, but isn’t the usual sort of probability distribution, because (putting it crudely) the probabilities of all the values don’t add up to the right thing. Most commonly seen in the phrase ‘improper prior’ in Bayesian statistics.


A way of estimating quantities by successively leaving out one observation at a time from a sample. (The American statistician John Tukey gave it the name, because it’s a bit like a jack-knife or Swiss army knife in that it can be a bit rough and ready but can be used on a lot of different problems, even though individual problems may have a better, more specialised, solution.)


The mode is the most common (or most probable) value in a data set or probability distribution. (Not really anything to do with fashion, or with a way of doing something.) Even more confusingly, within statistics the term tends to be used rather loosely and can mean slightly different things depending on the context.


A quantitative measure of a probability distribution or set of data. Basically, an average of a particular power (square, cube, fourth power or whatever) of the data. There are related meanings in physics and in other parts of mathematics. (But it has nothing to do with time.)


A moral graph is a concept in graph theory (which is used in statistics e.g. in Bayesian networks). This is about the sort of graph that consists of points (nodes) connected by edges (lines, or arrows), so it doesn’t even fit with what most people understand as a graph. Basically you start with a directed graph (where the edges are arrows that point from one node to another). A node has a child (in the jargon) if there is an arrow pointing from the ‘parent’ node to the ‘child’ node. To form a moral graph from a directed graph, if a child node has more than one parent node, you put in edges joining all the parents, and then turn it into an undirected graph by taking the heads off all the arrows. Actually ‘moral’ is connected to its everyday meaning, in a rather old-fashioned way, because when you join up the parents that weren’t previously joined, you’re ‘marrying’ them.


As in ‘normal distribution’ - a particular sort of probability distribution, said to be ‘bell-shaped’ (but I never saw a bell that shape…)


A statistical technique for quantifying relationships between variables. So-called because of Francis Galton’s related notion of ‘regression toward the mean’, which was originally connected to the everyday use.


There are lots of technical meanings, but one of them refers to a measure of how good something is at producing the same measurement whenever you do the measuring. But it doesn’t mean you can rely on that measurement – perhaps the measuring instrument or test always gives exactly half the correct value, in which case it’s not reliable at all in the everyday sense.


As mentioned above, very roughly, in statistics it’s a measure of the extent to which we can conclude an observed result isn’t simply due to random variability. There are serious arguments going on about the extent to which it really does measure that, but the point is that it does not necessarily tell you anything about how important or practical, in the real world, the result is.


In some statistical techniques, there’s a measure called ‘stress’ which is, roughly, a measure of how good a simplified representation of the data is. So in something called multidimensional scaling, you start with a list of how far apart a lot of individual things are, for example in three dimensional space, and then you try to produce a plot of them in (say) two dimensions, in a way such that the distances between the individuals in two dimensions is a close as possible to the original three dimensional distances you started with. The name ‘stress’ seems to come from the idea that you take the original points, which are in a lot of dimensions (mathematically), and you squidge them into just two dimensions (so that you can plot them on a piece of paper or computer screen), and the squidging will cause stress to the original configuration of points, so you want to make the stress as small as possible, and indeed measure how much stress there is.

Chalking a graph on a board Copyrighted  image Icon Copyright: Alphaspirit |

But why'd you have to go and make stats so complicated?

You might be wondering how all this sort of confusion arose. I think it’s not easy to get to the root of it, but it has something to do with the preferences of those who developed the subject of statistics, particularly in the first half of the twentieth century. The great British statistician and geneticist Sir Ronald Fisher (1890-1962) was very important and influential in popularising statistical methods and approaches. He, on the whole, favoured familiar-sounding terms rather than words with complicated Latin or Greek origins, which tend to be common in some other disciplines. But other statisticians have preferred to invent or promote words with more classical roots. The British statistician and mathematician Karl Pearson (1857-1936) preferred terms like “heteroscedasticity”, “kurtosis” and “histogram” – although probably the best-known statistical term associated with him is a way of measuring how closely two quantities are related, called the Pearson product-moment correlation coefficient, in which term “product” has nothing to do with the production of goods, and “moment” has nothing to do with time.

If you want to know more about the origins of statistical (and mathematical) terms, Jeff Miller's website on 'Earliest Known Uses of Some of the Words of Mathematics' is a great resource. Jeff is a Florida school teacher, but the site has many contributions from others – much of the statistical material was written by John Aldrich, a British academic economist and statistician, who has also produced an online index to the entries on probability and statistics to the entries on probability and statistics on Jeff Miller’s site.





Related content (tags)

Copyright information

For further information, take a look at our frequently asked questions which may give you the support you need.

Have a question?