1.4 Joining and transforming
With the little tables, I first transformed the columns and then joined the tables.
As you may be starting to realise, there’s often more than one way to do it. Just for illustration, I’ll do the other way round for the big tables. Here are the tables, as a reminder.
In []:
life.head()
Out[]:
country | SP.DYN.LE00.IN | |
---|---|---|
34 | Afghanistan | 60.931415 |
35 | Albania | 77.537244 |
36 | Algeria | 71.009659 |
39 | Angola | 51.866171 |
40 | Antigua and Barbuda | 75.829293 |
In []:
gdp.head()
Out[]:
country | NY.GDP.MKTP.CD | |
---|---|---|
34 | Afghanistan | 2.031088e+10 |
35 | Albania | 1.291667e+10 |
36 | Algeria | 2.101834e+11 |
38 | Andorra | 3.249101e+09 |
39 | Angola | 1.241632e+11 |
First, an inner join on the common column to combine rows where the common column value appears in both tables.
In []:
gdpVsLife = merge(gdp, life, on='country', how='inner')
gdpVsLife.head()
Out []:
country | NY.GDP.MKTP.CD | SP.DYN.LE00.IN | |
---|---|---|---|
0 | Afghanistan | 2.031088e+10 | 60.931415 |
1 | Albania | 1.291667e+10 | 77.537244 |
2 | Algeria | 2.101834e+11 | 71.009659 |
3 | Angola | 1.241632e+11 | 51.866171 |
4 | Antigua and Barbuda | 1.200588e+09 | 75.829293 |
Second, the dollars are converted to millions of pounds.
In []:
GDP = 'GDP (£m)'
column = gdpVsLife[GDP_INDICATOR]
gdpVsLife[GDP] = column.apply(usdToGbp).apply(roundToMillions)
gdpVsLife.head()
Out[]:
country | NY.GDP.MKTP.CD | SP.DYN.LE00.IN | GDP (£m) | |
---|---|---|---|---|
0 | Afghanistan | 2.031088e+10 | 60.931415 | 12980 |
1 | Albania | 1.291667e+10 | 77.537244 | 8255 |
2 | Algeria | 2.101834e+11 | 71.009659 | 134322 |
3 | Angola | 1.241632e+11 | 51.866171 | 79349 |
4 | Antigua and Barbuda | 1.200588e+09 | 75.829293 | 767 |
Third, the life expectancy is rounded to the nearest integer, with a by now familiar function.
In []:
LIFE = 'Life expectancy (years)'
gdpVsLife[LIFE] = gdpVsLife[LIFE_INDICATOR].apply(round)
gdpVsLife.head()
Out[]:
country | NY.GDP.MKTP.CD | SP.DYN.LE00.IN | GDP (£m) | Life expectancy (years) | |
---|---|---|---|---|---|
0 | Afghanistan | 2.031088e+10 | 60.931415 | 12980 | 61 |
1 | Albania | 1.291667e+10 | 77.537244 | 8255 | 78 |
2 | Algeria | 2.101834e+11 | 71.009659 | 134322 | 71 |
3 | Angola | 1.241632e+11 | 51.866171 | 79349 | 52 |
4 | Antigua and Barbuda | 1.200588e+09 | 75.829293 | 767 | 76 |
Lastly, the original columns are discarded.
In []:
headings = [COUNTRY, GDP, LIFE]
gdpVsLife = gdpVsLife[headings]
gdpVsLife.head()
Out[]:
country | GDP (£m) | Life expectancy (years) | |
---|---|---|---|
0 | Afghanistan | 12980 | 61 |
1 | Albania | 8255 | 78 |
2 | Algeria | 134322 | 71 |
3 | Angola | 79349 | 52 |
4 | Antigua and Barbuda | 767 | 76 |
For the first five countries there doesn’t seem to be any relation between wealth and life expectancy, but that might be just for those countries.
Exercise 9 Joining and transforming
Have a go at merging dataframes with an inner join in Exercise 9 in the Exercise notebook 3.