Learn to code for data analysis
Learn to code for data analysis

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

Free course

Learn to code for data analysis

Week 5: Combine and transform data Part 1


Welcome to Week 5.

Please note: in the following video, where reference is made to a study ‘week’, this corresponds to Weeks 5 and 6 of this course.

Download this video clip.Video player: lcdab_w5_intro.mp4
Skip transcript


So far you've looked at one data set at a time. Data analysis becomes more interesting when multiple data sets are put together. Data sets can be combined in various ways and for different reasons. The simplest option is to combine the same kind of data. One possible reason is to aggregate information. For example, data on the incidence of a disease is collected by regional health services, aggregated by governments at national level, and further aggregated by international agencies like the World Health Organisation. Another reason is to slice data in a different way. For example weather data sets for different years can be combined to see if a particular month is getting hotter or wetter over the decades.
Using multiple data sets makes it more likely for the data to not be in the format you need. For example, if one data set provides temperatures in Fahrenheit and the other in Celsius you'll need to convert one unit to the other to combine both data sets. This is where the power of programming really begins to show. As you'll see this week Python allows you to code your own data transformations. Combining different types of data is even more interesting. For example to really appreciate the effect of changes in the minimum wage amount over time it can be combined with the cost of living or the number of people on the minimum wage over the same period.
Combing different types of data requires that they share at least one uniquely identifying characteristic. For example, the minimum wage and the cost of living must all be about the same country or the same period of time. So that's what we'll be covering this week. How to transform data so that it can then be combined on some common characteristic. I hope you enjoy it.
End transcript
Interactive feature not available in single page view (see it in standard view).

In Week 1 you worked on a dataset that combined two different World Health Organization datasets: population and the number of deaths due to tuberculosis.

They could be combined because they share a common attribute: the countries. This week you will learn the techniques behind the creation of such a combined dataset.


Take your learning further

Making the decision to study can be a big step, which is why you'll want a trusted University. The Open University has 50 years’ experience delivering flexible learning and 170,000 students are studying with us right now. Take a look at all Open University courses.

If you are new to university level study, find out more about the types of qualifications we offer, including our entry level Access courses and Certificates.

Not ready for University study then browse over 900 free courses on OpenLearn and sign up to our newsletter to hear about new free courses as they are released.

Every year, thousands of students decide to study with The Open University. With over 120 qualifications, we’ve got the right course for you.

Request an Open University prospectus