Learn to code for data analysis
Learn to code for data analysis

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

Free course

Learn to code for data analysis

1.4 Joining and transforming

With the little tables, I first transformed the columns and then joined the tables.

As you may be starting to realise, there’s often more than one way to do it. Just for illustration, I’ll do the other way round for the big tables. Here are the tables, as a reminder.

In []:

life.head()

Out[]:

countrySP.DYN.LE00.IN
34Afghanistan60.931415
35Albania77.537244
36Algeria71.009659
39Angola51.866171
40Antigua and Barbuda75.829293

In []:

gdp.head()

Out[]:

countryNY.GDP.MKTP.CD
34Afghanistan2.031088e+10
35Albania1.291667e+10
36Algeria2.101834e+11
38Andorra3.249101e+09
39Angola1.241632e+11

First, an inner join on the common column to combine rows where the common column value appears in both tables.

In []:

gdpVsLife = merge(gdp, life, on='country', how='inner')

gdpVsLife.head()

Out []:

countryNY.GDP.MKTP.CDSP.DYN.LE00.IN
0Afghanistan2.031088e+1060.931415
1Albania1.291667e+1077.537244
2Algeria2.101834e+1171.009659
3Angola1.241632e+1151.866171
4Antigua and Barbuda1.200588e+0975.829293

Second, the dollars are converted to millions of pounds.

In []:

GDP = 'GDP (£m)'

column = gdpVsLife[GDP_INDICATOR]

gdpVsLife[GDP] = column.apply(usdToGbp).apply(roundToMillions)

gdpVsLife.head()

Out[]:

countryNY.GDP.MKTP.CDSP.DYN.LE00.INGDP (£m)
0Afghanistan2.031088e+1060.93141512980
1Albania1.291667e+1077.5372448255
2Algeria2.101834e+1171.009659134322
3Angola1.241632e+1151.86617179349
4Antigua and Barbuda1.200588e+0975.829293767

Third, the life expectancy is rounded to the nearest integer, with a by now familiar function.

In []:

LIFE = 'Life expectancy (years)'

gdpVsLife[LIFE] = gdpVsLife[LIFE_INDICATOR].apply(round)

gdpVsLife.head()

Out[]:

countryNY.GDP.MKTP.CDSP.DYN.LE00.INGDP (£m)Life expectancy (years)
0Afghanistan2.031088e+1060.9314151298061
1Albania1.291667e+1077.537244825578
2Algeria2.101834e+1171.00965913432271
3Angola1.241632e+1151.8661717934952
4Antigua and Barbuda1.200588e+0975.82929376776

Lastly, the original columns are discarded.

In []:

headings = [COUNTRY, GDP, LIFE]

gdpVsLife = gdpVsLife[headings]

gdpVsLife.head()

Out[]:

countryGDP (£m)Life expectancy (years)
0Afghanistan1298061
1Albania825578
2Algeria13432271
3Angola7934952
4Antigua and Barbuda76776

For the first five countries there doesn’t seem to be any relation between wealth and life expectancy, but that might be just for those countries.

Exercise 9 Joining and transforming

Have a go at merging dataframes with an inner join in Exercise 9 in the Exercise notebook 3.

LCDAB_1

Take your learning further

Making the decision to study can be a big step, which is why you'll want a trusted University. The Open University has 50 years’ experience delivering flexible learning and 170,000 students are studying with us right now. Take a look at all Open University courses.

If you are new to University-level study, we offer two introductory routes to our qualifications. You could either choose to start with an Access module, or a module which allows you to count your previous learning towards an Open University qualification. Read our guide on Where to take your learning next for more information.

Not ready for formal University study? Then browse over 1000 free courses on OpenLearn and sign up to our newsletter to hear about new free courses as they are released.

Every year, thousands of students decide to study with The Open University. With over 120 qualifications, we’ve got the right course for you.

Request an Open University prospectus371