Skip to content
Skip to main content

About this free course

Download this course

Share this free course

Learn to code for data analysis
Learn to code for data analysis

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

1.6 Calculations over columns

The last remaining task is to calculate the death rate of each country.

You may recall that with the simple approach I’d have to write:

rateAngola = deathsInAngola * 100 / populationOfAngola

rateBrazil = deathsInBrazil * 100 / populationOfBrazil

and so on, and so on. If you’ve used spreadsheets, it’s the same process: create the formula for the first row and then copy it down for all the rows. This is laborious and error-prone, e.g. if rows are added later on. Given that data is organised by columns, wouldn’t it be nice to simply write the following?

rateColumn = deathsColumn * 100 / populationColumn

Say no more: your wish is pandas’s command.

In []:

deathsColumn = data['TB deaths']

populationColumn = data['Population (1000s)']

rateColumn = deathsColumn * 100 / populationColumn

rateColumn

Out[]:

0 32.134873

1 2.196025

2 2.942576

3 8.850727

4 70.422535

5 19.167186

6 69.675621

7 1.319759

8 11.901928

9 9.326425

10 47.370017

11 87.378641

dtype: float64

Tadaaa! With pandas, the arithmetic operators become much smarter. When adding, subtracting, multiplying or dividing columns, the computer understands that the operation is to be done row by row and creates a new column.

All well and nice, but how to put that new column into the dataframe, in order to have everything in a single table? In an assignment variable = expression , if the variable hasn’t been mentioned before, the computer creates the variable and stores in it the expression’s value. Likewise, if I assign to a column that doesn’t exist in the dataframe, the computer will create it.

In []:

data['TB deaths (per 100,000)'] = rateColumn

data

Out[]:

CountryPopulation (1000s)TB deathsTB deaths (per 100,000)
0Angola21472690032.134873
1Brazil20036244002.196025
2China1393337410002.942576
3Equatorial Guinea757678.850727
4Guinea-Bissau1704120070.422535
5India125214024000019.167186
6Mozambique258341800069.675621
7Portugal106081401.319759
8Russian Federation1428341700011.901928
9Sao Tome and Principe193189.326425
10South Africa527762500047.370017
11Timor-Leste113399087.378641

That’s it! I’ve written all the code needed to answer the questions I had. Next I’ll write up the analysis into a succinct and stand-alone notebook that can be shared with friends, family and colleagues or the whole world. You’ll find that in the next section.