Diary of a data sleuth: Getting to grips with the census data

Updated Friday 14th December 2012

With the recent release of key statistics around the UK's 2011 census by the Office for National Statistics, our resident data sleuth set out to see what he could find.

A few days ago, a set of key statistics around the UK's 2011 census was released by the Office for National Statistics (ONS). For amateur statisticians, releases such as this provide a wealth of data that can be explored. For the data sleuth, official data releases supposedly makes discovering the data sets relatively straightforward, although evidence recently presented to the House of Commons Public Administration Committee (PASC) has been highly critical of the Office for National Statistics website in general (BBC: ONS website is improving, Andrew Dilnot tells critical MPs; watch Andrew Dilnot's evidence to the PASC, December 2012). If we start at the beginning, though, with a general web search engine, and the task of tracking down census data, how well do we get on? A quick search for uk census 2011 data turns up the Census 2011 area of the ONS website.

The census website provides three ways in to exploring more about the census data:

Screen shots from the Office for National Statistics website Copyrighted image Icon Copyright: ONS The census website

The census analysis ("Stories") section takes us to a series of articles that help us get some sort of understanding about the make up of the population of England and Wales (on census day, at least!). Each article addresses a particular topic, starting with religious affiliations, or ethnicity and national identity.  As Professor Kevin McConway describes in his compelling inaugural lecture, statisticians love to compare things, and in the case of the census, one obvious comparison to make is between regional or local distributions, comparisons which are often illustrated with maps.

The articles also hint at how we can start to pull stories out of data. It is quite possible that section headings along the lines of "Religious affiliation across the English regions and Wales" arose from a statistician asking him- or herself the question: "Do religious affiliations differ across regions, or within local areas?"; looking at the data through this lens;  refining this question as "How do religious affiliations differ across regions, or within local areas?"; and then reporting back on what they found. In many cases, one or more stories may suggest themselves as to why the population is distributed in this way, which can lead to a further round of analysis or the start of a journalistic or academic research project.

As well as illustrated text articles, some of the topics are also summarised by means of narrated videos, which are collated on the ONS Stats YouTube channel.

Religion in England and Wales

If you want to dig into the numbers behind the charts, links are provided to files containing the data in the Microsoft Office Excel/.xls spreadsheet format. This is slightly irritating, because it presupposes that we have a spreadsheet application to hand to view the data or access to an online spreadsheet application such as Google Sheets. While spreadsheets provide one of the most widely used tools for working with data, alternative approaches to working with data and creating data visualisations are increasingly to be found. One of my favourites is DataWrapper, a package built originally to support the work of data journalists, which lets you visualise simple data sets with a range of attractive charts that can be embedded in your own web pages. There are also power tools available for working with data in its rawest form, such as "data mechanics" tools like OpenRefine (which amongst other things will open data files, let you preview them, and then convert the data to other file formats), and statistical programming tools such as R/RStudio.

Unfortunately, the map-based figures don't give you direct access to the data. True, the interactive census tools do allow you to experiment, to a certain extent, with various map views, allowing you to select what values you want to compare on a geographical basis as well as over time in the form of a side by side view of the 2001 and 2011 census data.

But no data... Maybe it's in the data area of the site, hidden away amongst the Reference Tables? There are certainly lots of data there, but it's maybe not clear to all but data wrangling experts how you might take that data yourself and get it onto a map.

The rise of online and computer mapping tools, triggered in part by the expectations raised by Google Maps about what maps can - and should - be able to do, has made producing digital maps relatively easy:

  1. if you know what tools to use, and
  2. you're prepared to get your hands bitty. (Bitty as in bits... bits and bytes? Bits versus atoms? Oh, never mind...)

In brief, there are two basic approaches to producing 'coloured in' maps (a particular form of thematic map also known as choropleth maps). The first is to load your data into a tool that already knows about the shapes of different explicitly identified geographical regions. If you identify each region with an appropriate code, the area of a map associated with this identification code value can then be coloured in appropriately.

Screen shots from the Office for National Statistics website Copyrighted image Icon Copyright: ONS Census area codes

The second approach is to actually bundle so called "shapefile" data along with your actual data. A plotting tool can then be used to draw out the regional boundaries from the shape data, and then fill the regions with a colour according to one of the statistics values associated with the region. If you look carefully at the data sets, you will see that where regions are mentioned, they are identified by an explicit Area Code value. These codes may be recognised by mapping tools of the first type, tools such as OpenHeatMap. The ONS also publishes "shape" data for each of the Area Code identified regions, which can be used to power mapping tools of the second type. Conveniently, the Census Dissemination Unit, operated by Mimas at The University of Manchester, publish combined data and shape-data files that make it relatively easy to create your own maps. So for example, here's one I created this morning (it's easy when you know how).

An example demo of Census 2011 data Census 2011 demo

For more about online mapping, see OpenLearn: Visual representations of data and information - Geographical data.

In the field of astronomy, experienced amateur astronomers have in the past, and continue to this day, to make significant contributions to professional astronomy. In part, this is because there's a big sky out there and there aren't enough professional astronomers to watch it all, all of the time..! With the release of ever more public datasets, such as the 2011 Census data, might there be similar opportunities for dedicated amateur statisticians to help make sense out of it all? As citizen journalists, might we be able to spot interesting stories within the data, particularly at a local or regional level familiar to us, that may have been missed in national reports? And as creative technologists, might we be able to find new ways of exploring or visualising the data that make it easier to analyse for amateurs and professionals alike?

As a recently advertised job for the "Head of Rich Content" at the ONS suggests, the ONS website should anticipate and meet user needs and expectations, including those of the Citizen User [my emphasis]. Are you one of those users? And if so, how are you using the ONS website in general, and the ONS census website in particular? Let us know via the comments below.

[DISCLAIMER: In the interests of full disclosure, OpenLearn’s Laura Dewis will be leaving the OU to work for the ONS in early 2013. This article reflects the opinions of Tony Hirst, the author, and Laura wasn’t involved in writing or publishing any aspect of it. Everyone at OpenLearn looks forward to seeing how the ONS website develops with Laura on board their team.]


