Looking at population data

Featuring: Video Video

Tony Hirst and Hans Rosling take us on adventure through population data and what it can tell us.

By: Dr Tony Hirst (Communication and Systems Department) , Professor Hans Rosling (Gapminder)

Share on Google Plus Share on LinkedIn Share on Reddit View article Comments
Print

In this article:

Click on a subtitle if you wish to jump to content:

 

A brief history of censuses

Census taking - recording the size of a country's population, and in many cases the age and gender of everyone making up that population - has a long, if patchy, history. Originating over 3000 years ago, it is thought that the Babylonians used a census to aid in food planning. Two and half thousand years, the Ancient Egyptians used censuses to help in labour management and planning land use. The dawn of the Common Era saw census taking in China and throughout the Roman Empire. Although the Domesday Book of 1086 thoroughly surveyed England, the emphasis there was more on cataloguing land and property than generating population estimates.

The modern age of census taking began two hundred years or so ago. In England, statistician John Rickman conducted the first population census in 1801, setting in train a data collection exercise that repeats every ten years, the most recent UK census being in 2011.

The national statistics agencies of many countries publish a history of the census in their country. A few examples include:

Is a history of the census in your country listed there? If not, see if you can find a telling of it on from your own country's national statistical service. Or failing that, check Wikipedia, and see if you can find a link from there.

 

 

As Hans describes, data relating to population distributions is published by the United Nation’s Department of  Population and Economic Affairs Population Division. You can find the data he described on the World Population Prospects webpage.

If you want to download copies of the data, from the Online Database area in the navigation bar on the left-hand side of the page, select Detailed Indicators then from the form that is displayed choose Population by five-year age group and sex to get the data. The data can be downloaded as a simple text file in the CSV format that you can then load into a spreadsheet or other data analysis tool.

The UN Data Explorer also links to various other population related datasets:

By looking at population and demographic data from a variety of perspectives, we can build a richer picture of how a population is structured than if we just focus on a single dataset.

If the thought of having to download the data and then analyse it yourself fills you with dread, don't panic! There are also several online tools available that allow us to explore the population data directly in a graphical way. But first, let's see how Hans makes sense of the data...

As Hans mentioned at the start of that clip, the chart is a simplified population pyramid. Population pyramids are widely used for depicting the age structure of a population. In the "one sided" form used by Hans, the bands count the number of people falling into a particular age band. In a more traditional "two sided" population pyramid, the age bands are split to show counts of males and females in each band.

If you go to the Demographic Profiles page on the United Nation’s Department of Population and Economic Affairs Population Division website, you can select a country and display several different population pyramids for that country plotted using historical, recent and projected population data sets.

For example, if I select the United Kingdom region, we see the following population pyramids.

Image of population pyramids for the world in 1950, 2010, 2050 and 2100 Copyrighted image Copyright: The United Nations Source: United Nations, Department of Economic and Social Affairs, Population Division (2013). World Population Prospects: The 2012 Revision.

In these two sided charts, the x-axis is used to show a positive count in both directions. As before, the vertical y-axis depicts the different age bands. However, this time the horizontal x-axis is split into two components. On the left, we have the count for males (increasing count goes further to the left), and on the right, the number of females in each each band (increasing count to the right).

Source: United Nations, Department of Economic and Social Affairs, Population Division (2013). World Population Prospects: The 2012 Revision.

One thing you may notice is that for the UK the pyramid is more or less symmetrical about the centre line except in the older age groups: this tells us that there are roughly the same number of males and females in each age band apart from in the latter stages of life, which more women tend to reach than men.

 

 

As Hans suggests, the fertility rate of a population - how many children tend to be born, on average, per woman - has a great influence of the number of people introduced into the base of the population pyramid. The OpenLearn article “How fertility rates affect population” provides a guide to using an interactive population pyramid chart for the UK published by the UK Office of National Statistics (ONS). The pyramid allows you to explore the effects of different fertility rates on population pyramid models.

Animating population pyramids can be a very powerful way of seeing how a population has developed over time. The UN population pyramid view shows us fours static views of the population structure over time, but Hans Rosling's demonstrations used a single animated view to step between the population pyramids at different points in time.

The following chart type, originally created by Mike Bostock, is a variation of the idea of a population pyramid, presented in the form of a set over transparent, overlaid bars. It shows population distributions for a selected country since 1950. The horizontal x-axis shows each age band, and the vertical y-axis the number of people within that band. Another nice feature is that the period in which people in each particular age band were born is shown using a white year label sitting just above the x-axis.

This population pyramid is actually two sided, in that for each category there are actually two bars displayed, one overlaying the other. A pink column denotes the female population count within an age band, and a blue bar the male population count. Each bar is slightly transparent, giving a purple colour to the overlapped areas.

If you click into the chart and use the left- and right- arrow keys on your keyboard, you should be able to see how the population structure of Bangladesh in this case changed over the period 1950-2010.

As an example of how to read this chart, in 2010 there were just over 8 million males - the larger blue bar - in the second (5 to 10) age category and about 8 million girls: the height of the overlapped purple column. For the same year, in the 25-30 age band, there were about 7 million females  - the larger pink bar, this time - and just under 7 million males:  the height of the overlapped purple column.

 

What stories do differently shaped population pyramids tell?

The shapes that the population pyramids from different countries - or even the same country at different times - can be used to help us tell stories about the possible state of development (in population terms at least) of a country.

Source: United Nations, Department of Economic and Social Affairs, Population Division (2013). World Population Prospects: The 2012 Revision.
 

For example, in the above chart, which shows the population pyramids for Bangladesh in 1950 and 2010, we see two very different shapes. What story does the first one, for 1950 tell? To me, it suggests a high birth rate and consistent death rates through each decade of life. If mortality rates within each decade reduce, this population could start to grow quickly. In the second pyramid, from age 10 and upwards, we see a similar shape, but there a fewer children aged 5-10 and fewer again aged 0-5. Has the child mortality rate suddenly increased? Or has the fertility rate decreased? And what factors might contribute to that? More people having childbirth later in life, perhaps?

What shapes for the population pyramids would you expect to see for a country like France or Austria between 1950 and 2010? For a country with low child mortality rate and an effective health care system, what shape would you expect if the fertility rate was running at or around the replacement rate (2 children per woman)? What shape might you expect for a declining population (that is, one that is getting smaller over time?)

Have a look at the population pyramids from different countries listed on the UN Demographic Profiles page to see if your predictions - and your assumptions about various countries - were right.

Expert hint: if you use the R programming language, you can generate interactive international population pyramids with rCharts. Kyle Walker, assistant professor in the Department of History & Geography at Texas Christian University, explains how.

Generally, there are several elements of the population pyramid we might look for:

- how steeply are the sides of the pyramid angled at the top of the pyramid? That is, at what rate are older members of the population dying?

- is there a “belt”  - that is, is the pyramid narrower at the bottom, as well as the top, than a point somewhere in between? Has the birth rate started fall, or child mortality rate increased, and if so, when?

- if the base of the pyramid has straight sides, how far up do the sides extend before the angle pyramid shape begins? Straight sides suggest the population is running at replacement rate. The higher the sides extend, the longer and healthier a life people can expect.

- are the two sides of the population pyramid roughly symmetrical?

The following slideshow reviews several ways of interpreting differently shaped population pyramids.

All of these characteristics tell us something about the structure and evolution of the population - although we made need to look at other indicators (such as child mortality rates to age 5, or fertility rate by age band) to get a better sense of what is causing the population to evolve in the way it has.

Take a look at the following chart, which shows the population pyramid for Bahrain in 2010:

Source: United Nations, Department of Economic and Social Affairs, Population Division (2013). World Population Prospects: The 2012 Revision.

In words, what do you think the pyramid for 2010 shows? Where did all the people come from, particularly all the men, and even more particularly, the males over 25 years old?

Digging inside the data

Noting that something is apparently “odd” in the distribution is one thing, but how might we then go about trying to understand what possible stories might explain it?

One way is to try to “triangulate” several different data sets to see whether some sort of pattern or relationship holds between them. Hans Rosling often uses motion charts to help him look for repeated or common patterns of behaviour between different indicators across countries that we might group in particular ways. Another way is to plot multiple lines on a chart, or place complementary line charts side by side, so we can look for similarities and differences between them.

From the UN Population Prospects Demographic Profiles, select Bahrain as the country of interest and then click on the Line Charts tab to display a variety of line charts depicting how the values of several other indicators have changed over time.

If we look at the birth rates and child mortality figures for Bahrain we see a falling fertility rate and improving mortality rates for the under 5s. The fertility rate is well above replacement rate, so we’d expect population growth, but the chances of the birth rate giving such wildly different proportions of males and females is unlikely.

Copyrighted image Copyright: United Nations Source: United Nations, Department of Economic and Social Affairs, Population Division (2013). World Population Prospects: The 2012 Revision.

Another way of trying to understand the reason for the different population pyramid shapes for Bahrain between 1950 and 2010 is to look for more detail “inside” the population data itself by splitting it up into several different components that together make up the whole. For example, when looking at world population trends we could just look at the total world population value, or we could look at total population values for separate regions of the world, or within those population trends for a particular country. The aggregated world population data set would tell one story, but the regional and country level trends may tell very different ones.

If we look at the total population over time, we see a steady increase in population from 1950 to 1970 or so, increased growth from 1970 to 2000, then a sudden jump from 2000 to 2010.  If we now look at the total population by age group over time, we notice a big jump in the number of 15-64 year olds starting around 2000 that cannot be explained by births  (there isn’t a large number of 0-14 years in the years leading up to 2000 to account for the increase).

Copyrighted image Copyright: The United Nations Source: United Nations, Department of Economic and Social Affairs, Population Division (2013). World Population Prospects: The 2012 Revision.

The increase in 15-64s between 1970 and 2000 may in part relate to the longer life expectancy and improved child mortality rate as children move from the 0-14 band into the 15-64 category, but this can be quite hard to think through. If we had equal width age bins (say, 0-14, 15-29, 30-44, and so on) and statistics collected every 15 years, we should see a steady transfer of people from one age band to the next as we collect samples, 0 year olds becoming 15 year olds, 20 year olds becoming 35 years olds, and so on. However, with different width age bands, (0-14, 15-64, 65+) and data collected every 10 years, it can be hard to imagine how the counts in the 0-14 age band translate into changes in the counts given for the 15-64 age band over time, particularly if we don’t know how the population is initially structured in the 15-64 age band. That said, such a large jump in the number of 15-64 year olds must be due to some other factor, mustn’t it?

The following chart provides another view over the population counts for Bahrain within selected age ranges. The data used to generate the set comes from the UN Population Division’s Online Database - Detailed Indicators - Population by five-year age group and sex dataset. (The data for all countries can also be downloaded as Excel data tables from the Tables in EXCEL-Formatarea or in a single, large CSV file from the Data in ASCII-Format - Extended Data Set area.)

Playing through the population chart below, we can get a feel for how rapid the explosion in population size was.

Click on the graphic then use the left- and right- keyboard arrows to move through the years.

Explore the chart to see how the relative number of people in each age range compare over time by gender, paying particular attention to the maximum relative amount over time achieved in each case. How do the charts for Males and Females compare? Note that to get into the 25-29 age band requires passing through the 0-4 age band. That is, we might expect the values for the 25-29 age range to lag that of the 0-4 age range. For example, if the population is a closed one (the only way for new people to join the population is to be born into it), and nobody who is born dies until after they are thirty, we should be able to move the bar for the 0-4 year age range 25 years to the left in order to get the bar for the 25-29 year olds 25 years later. This chart communicates in a powerful way how members are introduced into the population through birth (or at least, by the introduction of people in the 0-5 age range) by the way they enter the chart from the right hand side as time moves forward.

So how should we read the chart if there are more people in an age band such as the 25-29 age band than have ever appeared in the 0 to 4 age band? What may have caused such an increase in population, particularly amongst males? And what sorts of dataset do you think might be used to provide evidence in support of any such explanation?

If the population increase can’t be accounted for by births (people joining the population by being born into it), or a huge reduction in mortality rates (people leaving the population by dying), it must be accounted for either people failing to leave the population by other means (for example, emigration) or by people joining the population by other means (for example, immigration). Data relating to migration into and out of Bahrain over the 1950-2010 period should help us explore the extent to which this is the reason for the major population change.

Test your understanding

For the Bahrain data, how else might you have looked at it to test whether birth rates may have been responsible for the sudden population change?

One way would have been to look at the year 2000 data for the 25 to 29 age range, for example, and compare it with the 1995 data for the 20-24 age range, or the 1990 data for the 15-19 age range. (Someone aged 16 in 1990 would be aged 26 in 2000.) If population increases were accounted for solely by births within the country, the number of people in the 25 to 29 age range in 2000 would have to be equal to or less than the number of people in the 15-19 age range in 1990 (although other factors as well as birth rate may account for the number of 25-29 year olds in 2000). If the number of people in the 25 to 29 age range in 2000 exceeded the number of people in the 15-19 age range in 1990, some factor other than domestic births must be accounting for the population increase in that age band at least.

Looking for discontinuities

One of the most powerful attractions for using charts is that discontinuities, or sudden changes in value, can often jump out at you.

Using the UN’s Department of Population and Economic Affairs Population Division Demographic Profiles page, look at the population pyramid for Bangladesh.

Copyrighted image Copyright: The United Nations Source: United Nations, Department of Economic and Social Affairs, Population Division (2013). World Population Prospects: The 2012 Revision.

In development terms, it looks as if Bangladesh has turned a corner in terms of its population profile, but nothing dramatic appears to have happened. But could this chart be hiding a story that is only revealed by looking at the data in a different way? Using the UN demographic profiles data, have a look at the data shown in the Line Charts for Bangladesh.

Copyrighted image Copyright: The United Nations Source: United Nations, Department of Economic and Social Affairs, Population Division (2013). World Population Prospects: The 2012 Revision.

What happened to child mortality and life expectancy around about 1970, where the blue line makes an abrupt jump? The regional curves have a relatively smooth shape to them (no drastic changes) but something catastrophic appears to have happened in Bangladesh in the mid to early 1970s going by those two data sets. What could have caused the increased child mortality and decreased life expectancy, and left the fertility rate in its steady trajectory?  Famine? Disease? War? How would you set about finding out?

There are two things worthy of note here. The first is that one chart or dataset may not reveal something that another view over a related dataset might disclose. This is true of many other sorts of statistical representation, such as summary statistics like averages. The second is that discontinuities and abrupt changes in graphical line charts are immediately noticeable in a way that “out of sorts” numerical values listed in a data table may not be. Such discontinuities often signal that something happened, or that there’s a story to be told (or maybe that there is something with the data at that point!); but such discontinuities don’t always tell you what happened, or what the causes may have been. For that, we often need to look to other data sets or look at the data from other points of view.

Finally, let's have a look at how the total population in Bangladesh has changed over time.

Copyrighted image Copyright: The United Nations Source: United Nations, Department of Economic and Social Affairs, Population Division (2013). World Population Prospects: The 2012 Revision.

Do you notice anything interesting about the chart showing total population by major groups? I spotted a couple of things in particular.

  • the red line for the 15-64 age group represents a much larger population than the blue (0-14) and 65+ age ranges. This should not surprise use - 15-64 represents three 15 year groups compared to the single 0-14 group and an approximate 65-90 group;
  • the humps in the curves lag one another - that is, the 0-14 age bumps (or reaches a maximum value before falling off slightly) first, in 2000, followed by the 15-64 age range in 2025 and the 65s and over in 2075. Again, a little bit of thought suggests why this might be the case - migration aside, you need to be counted in the 0-14 age range in the years before you can be counted in the 16-64+ age range. 

The ability to read charts and the patterns that are hidden within in them is a skill that we often don't develop as much as we could. By looking for stories in charts, and trying to make sense of one line, for example, in the context of another, we can start to identify some of those thousand words that pictures are often claimed to save ("a picture is worth a thousand words")!

Up next - Learn How to compare income across countries.