Skip to main content

About this free course

Become an OU student

Share this free course

Digital humanities: humanities research in the digital age
Digital humanities: humanities research in the digital age

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

Session 3 Data wrangling

This session is written by Hugo Leal from The University of Cambridge.

Look around. Almost everything you see is data or can be turned into data. The text you are reading right now is data. The idea that characters on a screen are data may sound surprising or daunting. A nineteenth-century digitised letter, an online news article or a post on social media are all data too. They can all be created, collected, wrangled, queried, analysed, visualised and interpreted to address research questions.

In the last session you learned about accessing and collecting data. Most digital data we collect are ‘raw’ – they haven’t been prepared for any type of analysis. Raw data often contain omissions, errors and inconsistencies – we call this messy data. You will look at the basic steps from raw/messy data to a standard model, like a table. You’ll learn about a vital process in digital research called ‘data wrangling’, the process through which we convert our raw data into an organised dataset suitable for analysis, visualisation and interpretation. In this session and in the next, you will be using digitised letters of Charles Darwin to illustrate both data wrangling and data analysis.

Download this video clip.Video player: Video 3
Copy this transcript to the clipboard
Print this transcript
Show transcript|Hide transcript
Video 3
Interactive feature not available in single page view (see it in standard view).

By the end of this session, you should be able to:

  • understand the importance of data wrangling for digital methods
  • grasp the specific elements of the data wrangling process.