Science, Maths & Technology

Diary of a data sleuth: Getting to grips with the census data

Updated Friday 14th December 2012

With the recent release of key statistics around the UK's 2011 census by the Office for National Statistics, our resident data sleuth set out to see what he could find.

A few days ago, a set of key statistics around the UK's 2011 census was released by the Office for National Statistics (ONS). For amateur statisticians, releases such as this provide a wealth of data that can be explored. For the data sleuth, official data releases supposedly makes discovering the data sets relatively straightforward, although evidence recently presented to the House of Commons Public Administration Committee (PASC) has been highly critical of the Office for National Statistics website in general (BBC: ONS website is improving, Andrew Dilnot tells critical MPs; watch Andrew Dilnot's evidence to the PASC, December 2012). If we start at the beginning, though, with a general web search engine, and the task of tracking down census data, how well do we get on? A quick search for uk census 2011 data turns up the Census 2011 area of the ONS website.

The census website provides three ways in to exploring more about the census data:

Screen shots from the Office for National Statistics website Copyrighted image Icon Copyright: ONS The census website

The census analysis ("Stories") section takes us to a series of articles that help us get some sort of understanding about the make up of the population of England and Wales (on census day, at least!). Each article addresses a particular topic, starting with religious affiliations, or ethnicity and national identity.  As Professor Kevin McConway describes in his compelling inaugural lecture, statisticians love to compare things, and in the case of the census, one obvious comparison to make is between regional or local distributions, comparisons which are often illustrated with maps.

The articles also hint at how we can start to pull stories out of data. It is quite possible that section headings along the lines of "Religious affiliation across the English regions and Wales" arose from a statistician asking him- or herself the question: "Do religious affiliations differ across regions, or within local areas?"; looking at the data through this lens;  refining this question as "How do religious affiliations differ across regions, or within local areas?"; and then reporting back on what they found. In many cases, one or more stories may suggest themselves as to why the population is distributed in this way, which can lead to a further round of analysis or the start of a journalistic or academic research project.

As well as illustrated text articles, some of the topics are also summarised by means of narrated videos, which are collated on the ONS Stats YouTube channel.

Religion in England and Wales

If you want to dig into the numbers behind the charts, links are provided to files containing the data in the Microsoft Office Excel/.xls spreadsheet format. This is slightly irritating, because it presupposes that we have a spreadsheet application to hand to view the data or access to an online spreadsheet application such as Google Sheets. While spreadsheets provide one of the most widely used tools for working with data, alternative approaches to working with data and creating data visualisations are increasingly to be found. One of my favourites is DataWrapper, a package built originally to support the work of data journalists, which lets you visualise simple data sets with a range of attractive charts that can be embedded in your own web pages. There are also power tools available for working with data in its rawest form, such as "data mechanics" tools like OpenRefine (which amongst other things will open data files, let you preview them, and then convert the data to other file formats), and statistical programming tools such as R/RStudio.

Unfortunately, the map-based figures don't give you direct access to the data. True, the interactive census tools do allow you to experiment, to a certain extent, with various map views, allowing you to select what values you want to compare on a geographical basis as well as over time in the form of a side by side view of the 2001 and 2011 census data.

But no data... Maybe it's in the data area of the site, hidden away amongst the Reference Tables? There are certainly lots of data there, but it's maybe not clear to all but data wrangling experts how you might take that data yourself and get it onto a map.

The rise of online and computer mapping tools, triggered in part by the expectations raised by Google Maps about what maps can - and should - be able to do, has made producing digital maps relatively easy:

  1. if you know what tools to use, and
  2. you're prepared to get your hands bitty. (Bitty as in bits... bits and bytes? Bits versus atoms? Oh, never mind...)

In brief, there are two basic approaches to producing 'coloured in' maps (a particular form of thematic map also known as choropleth maps). The first is to load your data into a tool that already knows about the shapes of different explicitly identified geographical regions. If you identify each region with an appropriate code, the area of a map associated with this identification code value can then be coloured in appropriately.

Screen shots from the Office for National Statistics website Copyrighted image Icon Copyright: ONS Census area codes

The second approach is to actually bundle so called "shapefile" data along with your actual data. A plotting tool can then be used to draw out the regional boundaries from the shape data, and then fill the regions with a colour according to one of the statistics values associated with the region. If you look carefully at the data sets, you will see that where regions are mentioned, they are identified by an explicit Area Code value. These codes may be recognised by mapping tools of the first type, tools such as OpenHeatMap. The ONS also publishes "shape" data for each of the Area Code identified regions, which can be used to power mapping tools of the second type. Conveniently, the Census Dissemination Unit, operated by Mimas at The University of Manchester, publish combined data and shape-data files that make it relatively easy to create your own maps. So for example, here's one I created this morning (it's easy when you know how).

An example demo of Census 2011 data Census 2011 demo

For more about online mapping, see OpenLearn: Visual representations of data and information - Geographical data.

In the field of astronomy, experienced amateur astronomers have in the past, and continue to this day, to make significant contributions to professional astronomy. In part, this is because there's a big sky out there and there aren't enough professional astronomers to watch it all, all of the time..! With the release of ever more public datasets, such as the 2011 Census data, might there be similar opportunities for dedicated amateur statisticians to help make sense out of it all? As citizen journalists, might we be able to spot interesting stories within the data, particularly at a local or regional level familiar to us, that may have been missed in national reports? And as creative technologists, might we be able to find new ways of exploring or visualising the data that make it easier to analyse for amateurs and professionals alike?

As a recently advertised job for the "Head of Rich Content" at the ONS suggests, the ONS website should anticipate and meet user needs and expectations, including those of the Citizen User [my emphasis]. Are you one of those users? And if so, how are you using the ONS website in general, and the ONS census website in particular? Let us know via the comments below.

[DISCLAIMER: In the interests of full disclosure, OpenLearn’s Laura Dewis will be leaving the OU to work for the ONS in early 2013. This article reflects the opinions of Tony Hirst, the author, and Laura wasn’t involved in writing or publishing any aspect of it. Everyone at OpenLearn looks forward to seeing how the ONS website develops with Laura on board their team.]

 

For further information, take a look at our frequently asked questions which may give you the support you need.

Have a question?

Other content you may like

Introduction to computer forensics and investigations Copyrighted image Icon Copyright: Used with permission free course icon Level 2 icon

Science, Maths & Technology 

Introduction to computer forensics and investigations

With a few easily available tools people can reveal the stored passwords on their computer and access previously deleted data. Learn about some of the issues in data privacy and computer forensics. This free course, Introduction to computer forensics and investigations, provides practical demonstration in a clear and accessible format.

Free course
6 hrs
Dawn of the digital census article icon

Society, Politics & Law 

Dawn of the digital census

With online returns for the UK's 2011 census opening tomorrow, Dr Ruppert, research fellow with the ESRC Centre for Research on Socio-cultural Change (CRESC), explores the impact of the web on the census

Article
Probability compared Copyrighted image Icon Copyright: photos.com activity icon

Science, Maths & Technology 

Probability compared

You know you'll die. You know you'll probably never win the lottery. We show you some points in between...

Activity
Do crowds behave like fluids? Copyrighted image Icon Copyright: BBC article icon

Science, Maths & Technology 

Do crowds behave like fluids?

It used to be believed that crowds behaved like fluids - until Keith Still proved otherwise.

Article
Patterns and Landscapes Copyrighted image Icon Copyright: Production team article icon

Science, Maths & Technology 

Patterns and Landscapes

Marcus du Sautoy delves into the work of Gauss and Riemann, the two mathematicians who started to discover the order behind prime numbers

Article
Exploring distance time graphs Copyrighted image Icon Copyright: Used with permission free course icon Level 1 icon

Science, Maths & Technology 

Exploring distance time graphs

Graphs are a common way of presenting information. However, like any other type of representation, graphs rely on shared understandings of symbols and styles to convey meaning. Also, graphs are normally drawn specifically with the intention of presenting information in a particularly favourable or unfavourable light, to convince you of an argument or to influence your decisions. This free course, Exploring distance time graphs, will enable you to explain, construct, use and interpret distance-time graphs.

Free course
12 hrs
Opinion polls in a nutshell Copyrighted image Icon Copyright: The Open University video icon

Science, Maths & Technology 

Opinion polls in a nutshell

Have opinion polls got you baffled? Fear not - Professor Kevin McConway explains opinion polls in these short videos.

Video
10 mins
Speed Copyrighted image Icon Copyright: BBC article icon

Science, Maths & Technology 

Speed

Robert Llewellyn and Dr Jonathan Hare take on Hollywood Science, testing the science that filmgoers take for granted. Here they look at how well the science in the movie Speed stacks up

Article
Beating the bookies: The maths of a World Cup 2010 win Copyrighted image Icon Copyright: Timsnell under CC-BY-ND licence article icon

Science, Maths & Technology 

Beating the bookies: The maths of a World Cup 2010 win

The World Cup is a statistician's dream - but can you use maths to break a bookie's bank and heart?

Article