Skip to main content

About this free course

Download this course

Share this free course

Learn to code for data analysis
Learn to code for data analysis

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

1.4 Joining and transforming

With the little tables, I first transformed the columns and then joined the tables.

An image of a bride and groom holding hands with the minister between them in the background
Figure 2

As you may be starting to realise, there’s often more than one way to do it. Just for illustration, I’ll do the other way round for the big tables. Here are the tables, as a reminder.

In []:

life.head()

Out[]:

countrySP.DYN.LE00.IN
34Afghanistan60.931415
35Albania77.537244
36Algeria71.009659
39Angola51.866171
40Antigua and Barbuda75.829293

In []:

gdp.head()

Out[]:

countryNY.GDP.MKTP.CD
34Afghanistan2.031088e+10
35Albania1.291667e+10
36Algeria2.101834e+11
38Andorra3.249101e+09
39Angola1.241632e+11

First, an inner join on the common column to combine rows where the common column value appears in both tables.

In []:

gdpVsLife = merge(gdp, life, on='country', how='inner')

gdpVsLife.head()

Out []:

countryNY.GDP.MKTP.CDSP.DYN.LE00.IN
0Afghanistan2.031088e+1060.931415
1Albania1.291667e+1077.537244
2Algeria2.101834e+1171.009659
3Angola1.241632e+1151.866171
4Antigua and Barbuda1.200588e+0975.829293

Second, the dollars are converted to millions of pounds.

In []:

GDP = 'GDP (£m)'

column = gdpVsLife[GDP_INDICATOR]

gdpVsLife[GDP] = column.apply(usdToGbp).apply(roundToMillions)

gdpVsLife.head()

Out[]:

countryNY.GDP.MKTP.CDSP.DYN.LE00.INGDP (£m)
0Afghanistan2.031088e+1060.93141512980
1Albania1.291667e+1077.5372448255
2Algeria2.101834e+1171.009659134322
3Angola1.241632e+1151.86617179349
4Antigua and Barbuda1.200588e+0975.829293767

Third, the life expectancy is rounded to the nearest integer, with a by now familiar function.

In []:

LIFE = 'Life expectancy (years)'

gdpVsLife[LIFE] = gdpVsLife[LIFE_INDICATOR].apply(round)

gdpVsLife.head()

Out[]:

countryNY.GDP.MKTP.CDSP.DYN.LE00.INGDP (£m)Life expectancy (years)
0Afghanistan2.031088e+1060.9314151298061
1Albania1.291667e+1077.537244825578
2Algeria2.101834e+1171.00965913432271
3Angola1.241632e+1151.8661717934952
4Antigua and Barbuda1.200588e+0975.82929376776

Lastly, the original columns are discarded.

In []:

headings = [COUNTRY, GDP, LIFE]

gdpVsLife = gdpVsLife[headings]

gdpVsLife.head()

Out[]:

countryGDP (£m)Life expectancy (years)
0Afghanistan1298061
1Albania825578
2Algeria13432271
3Angola7934952
4Antigua and Barbuda76776

For the first five countries there doesn’t seem to be any relation between wealth and life expectancy, but that might be just for those countries.

Exercise 9 Joining and transforming

Have a go at merging dataframes with an inner join in Exercise 9 in the Exercise notebook 3.