Session 3 Data wrangling
This session is written by Hugo Leal from The University of Cambridge.
Look around. Almost everything you see is data or can be turned into data. The text you are reading right now is data. The idea that characters on a screen are data may sound surprising or daunting. A nineteenth-century digitised letter, an online news article or a post on social media are all data too. They can all be created, collected, wrangled, queried, analysed, visualised and interpreted to address research questions.
In the last session you learned about accessing and collecting data. Most digital data we collect are ‘raw’ – they haven’t been prepared for any type of analysis. Raw data often contain omissions, errors and inconsistencies – we call this messy data. You will look at the basic steps from raw/messy data to a standard model, like a table. You’ll learn about a vital process in digital research called ‘data wrangling’, the process through which we convert our raw data into an organised dataset suitable for analysis, visualisation and interpretation. In this session and in the next, you will be using digitised letters of Charles Darwin to illustrate both data wrangling and data analysis.
Transcript: Video 3
By the end of this session, you should be able to:
- understand the importance of data wrangling for digital methods
- grasp the specific elements of the data wrangling process.