Course content Course content

Learn to code for data analysis

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

More free courses

Week 6: Combine and transform data Part 2

1 Joining left, right and centre

Let’s take stock for a moment. There’s the original, unchanged table (with full country names) about the life expectancy:

In []:

life

Out[]:

	Country name	Life expectancy (years)
0	China	75
1	Russia	71
2	United States	79
3	India	66
4	United Kingdom	81

… and a table with the GDP in millions of pounds and also full country names.

In []:

gdp

Out[]:

	Country name	GDP (£m)
0	United Kingdom	1711727
1	United States	10716029
2	China	5905202
3	Brazil	1435148
4	South Africa	233937

Both tables have a common column with a common name (‘Country name’). I can join the two tables on that common column, using the merge() function. Merging basically puts all columns of the two tables together, without duplicating the common column, and joins any rows that have the same value in the common column.

There are four possible ways of joining, depending on which rows I want to include in the resulting table. If I want to include only those countries appearing in the GDP table, I call the merge() function like so:

In []:

merge(gdp, life, on='Country name', how='left')

Out[]:

	Country name	GDP (£m)	Life expectancy (years)
0	United Kingdom	1711727	81
1	United States	10716029	79
2	China	5905202	75
3	Brazil	1435148	NaN
4	South Africa	233937	NaN

The first two arguments are the tables to be merged, with the first table being called the ‘left’ table and the second being the ‘right’ table. The on argument is the name of the common column, i.e. both tables must have a column with that name. The how argument states I want a left join , i.e. the resulting rows are dictated by the left (GDP) table. You can easily see that India and Russia, which appear only in the right (expectancy) table, don’t show up in the result. You can also see that Brazil and South Africa, which appear only in the left table, have an undefined life expectancy. (Remember that ‘NaN’ stands for ‘not a number.)

A right join will instead take the rows from the right table, and add the columns of the left table. Therefore, countries not appearing in the left table will have undefined values for the left table’s columns:

In []:

merge(gdp, life, on='Country name', how='right')

Out[]:

	Country name	GDP (£m)	Life expectancy (years)
0	United Kingdom	1711727	81
1	United States	10716029	79
2	China	5905202	75
3	Russia	NaN	71
4	India	NaN	66

The third possibility is an outer join which takes all countries, i.e. whether they are in the left or right table. The result has all the rows of the left and right joins:

In []:

merge(gdp, life, on='Country name', how='outer')

Out[]:

	Country name	GDP (£m)	Life expectancy (years)
0	United Kingdom	1711727	81
1	United States	10716029	79
2	China	5905202	75
3	Brazil	1435148	NaN
4	South Africa	233937	NaN
5	Russia	NaN	71
6	India	NaN	66

The last possibility is an inner join which takes only those countries common to both tables, i.e. for which I know the GDP and the life expectancy. That’s the join I want, to avoid any undefined values:

In []:

gdpVsLife = merge(gdp, life, on='Country name', how='inner')

Out[]:

	Country name	GDP (£m)	Life expectancy (years)
0	United Kingdom	1711727	81
1	United States	10716029	79
2	China	5905202	75

Now it’s just a matter of applying the data transformation and combination techniques seen so far to the real data from the World Bank.

Exercise 5 Joining left, right and centre

Put your learning into practice by completing Exercise 5 in the Exercise notebook 3.

Remember to run the existing code in the notebook before you start the exercise. When you’ve completed the exercise, save the notebook.

Next 1.1 Constant variables

My OpenLearn Profile

About this free course

Become an OU student

Download this course

Share this free course

Week 6: Combine and transform data Part 2

1 Joining left, right and centre

Exercise 5 Joining left, right and centre