"Besides everything else, look at the data, look at the facts about the world," enthuses Hans Rosling in Don't Panic, encouraging us all to look to the data to make sense of the world, rather than relying our preconceived ideas about fertility rates or literacy rates around the world.
That's all very well for a statistician to say, but how would the rest of us even know where to start if we wanted to explore such data visualisations ourselves?
One good place is with Hans Rosling's own Gapminder site (although if you're on a tablet computer, you may find it doesn't work—try on your laptop or desktop machine instead).
For several years, the Gapminder Foundation has taken on the role of a 'modern "museum" on the Internet', curating a collection of datasets relating to international health and development indicators from a range of trusted sources. To help bring the data alive, site visitors can use a range of interactive software tools—most notably the "Trendalyser"—to engage with the data.
As well as using the application to animate how the data changes over time, we can also use it to generate static images to illustrate a particular story. For example, the following screenshot shows how we can track the relationship between the average income per person and the number of children per woman (the fertility rate) for different countries over the last two hundred years or so.
In the example shown here, I've highlighted the data relating to Bangladesh, as well as keeping track of how that relationship evolved over time (try it here).
The Gapminder application allows you to compare a whole range of indicators from within the application itself, and share links that include the settings for a specific setup.
Finding the Data
Whilst the Gapminder Trendalyser application is a powerful tool for exploring a range of datasets, for some situations you may want to get hold of the data yourself. Once again, the Gapminder website is a great place to start: all the data we can explore with the application can be downloaded from the site as simple spreadsheet data files.
You may notice that the original source of the data sets is also provided. Although there is a increasing amount of data published on the web, finding good quality, trusted data can often be problematic, so it's encouraging to see how Gapminder publishes a description of the provenance of the data that the site uses on the one hand, and that it uses reliable sources on the other.
If you are keen to start exploring the data riches that are published by some of the international development agencies, here's a list of sites to get you started:
- United Nations population data: includes population data and forecasts, fertility and mortality data, migration data;
- World Bank data portal: a wide ranging data collection relating to development indictors for countries around the world. Explore by topic, indicator, or country. Includes tabular and graphical views of the data, as well as data downloads;
- United Nations Statistics Division: a wide ranging collection of datasets relating to economics, demographics, energy, etc. Not for the faint hearted!
- International Labour Organisation: a range of databases and datasets relating to labour statistics.
- FAO (Food and Agriculture Organisation of the United Nations): check the "related links' section for links to sector specific statistics relating to economic and trade data, forestry data, fisheries data, water data
- ITU (International Telecommunication Union): statistics relating to telecommunications and IT development around the world.
When looking for data, try to find files that have the suffix .xls or .xlsx (spreadsheet files) or .csv or .tsv (simple text formatted comma or tab separated variable files, respectively). If the data appears as a table in a PDF or word processor document, it's much harder to get it into a form that you can actually work with, though it can be done if you have the right tools available.
Motion Charts via Google Charts
In recent years, the type of chart that Hans Rosling and the Gapminder Foundation popularised has started to appear more widely as a chart type known as a motion chart. In the same way that you can use a spreadsheet to generate line charts or bar charts from your own data, some spreadsheet applications also support motion charts.
For example, if you want to create a motion chart from your own data that is contained in a Google Spreadsheet, all you need to do is make sure that the data is organised in the following way:
- the first column should contain the names of things you want to track, such as a country name. All subsequent columns can contain either numeric or text data.
- the second column must include date values; this might just be a year (for example, a number formatted column containing a value such as 1973), a full date (as a date formatted column), a week number (eg 2008W03, as a string column type) or quarter format (for example, 2008Q3, again as a string column type).
- the other columns should contain numbers, which will be available in the X, Y, Color and Size axes of the motion chart; or text, the options for which will only appear in the Color menu.
Here is an example of a motion chart in Google Spreadsheets; unfortunately, as with the Gapminder Trendalyser application, it's unlikely to work on many tablet computers.
Google Motion Charts can also be generated from visualisation toolkits associated with data analysis programming tools such as R. This is a little too involved to describe here, but if you're interested, these links should get you started: googleVis toolkit (R), First steps of using googleVis on shiny.
Web Native Motion Charts
One of the first world problems associated with the Gapminder Trendalyser and Google motion charts is that they don't work on many tablet computers because of the dependencies they have on some specific browser plugins. So what options are left to us if we want to make use of such visualisations, safe in the knowledge that we will actually be able to see them working on a wider range of computers?
One possibility is to make use of one of the growing number of visualisation libraries that have been developed to support rich interactive data visualisations using standardised (and near universal) web technologies.
For example, the following motion chart has been constructed using the popular d3.js visualisation toolkit (click here for the interactive version).
Whilst it's getting easier to generate such charts from data analysis tools such as R (in this case, using the rCharts library, for example), components such as the motion chart still require a certain amount of programming knowledge to get the data into them and the charts actually up and running.
So what's stopping you...?
Once upon a time it was quite hard for anyone other than academics or public servants operating at national or international agencies to get hold of timely, and good quality, data. With more and more data being published onto the web by these agencies, we are now in a position where we can start learning directly from the data ourselves, as well as checking the stories that others tell us based on their reading of—or spin applied to—the data.
Now you've seen where the data lives, as well as some ways of quizzing it using free online interactive data visualisation tools. So which of your assumptions are you now going to put to the data test?