1.6 Calculations over columns
The last remaining task is to calculate the death rate of each country.
You may recall that with the simple approach I’d have to write:
rateAngola = deathsInAngola * 100 / populationOfAngola
rateBrazil = deathsInBrazil * 100 / populationOfBrazil
and so on, and so on. If you’ve used spreadsheets, it’s the same process: create the formula for the first row and then copy it down for all the rows. This is laborious and error-prone, e.g. if rows are added later on. Given that data is organised by columns, wouldn’t it be nice to simply write the following?
rateColumn = deathsColumn * 100 / populationColumn
Say no more: your wish is pandas’s command.
In []:
deathsColumn = data['TB deaths']
populationColumn = data['Population (1000s)']
rateColumn = deathsColumn * 100 / populationColumn
rateColumn
Out[]:
0 32.134873
1 2.196025
2 2.942576
3 8.850727
4 70.422535
5 19.167186
6 69.675621
7 1.319759
8 11.901928
9 9.326425
10 47.370017
11 87.378641
dtype: float64
Tadaaa! With pandas, the arithmetic operators become much smarter. When adding, subtracting, multiplying or dividing columns, the computer understands that the operation is to be done row by row and creates a new column.
All well and nice, but how to put that new column into the dataframe, in order to have everything in a single table? In an assignment variable = expression , if the variable hasn’t been mentioned before, the computer creates the variable and stores in it the expression’s value. Likewise, if I assign to a column that doesn’t exist in the dataframe, the computer will create it.
In []:
data['TB deaths (per 100,000)'] = rateColumn
data
Out[]:
Country | Population (1000s) | TB deaths | TB deaths (per 100,000) | |
---|---|---|---|---|
0 | Angola | 21472 | 6900 | 32.134873 |
1 | Brazil | 200362 | 4400 | 2.196025 |
2 | China | 1393337 | 41000 | 2.942576 |
3 | Equatorial Guinea | 757 | 67 | 8.850727 |
4 | Guinea-Bissau | 1704 | 1200 | 70.422535 |
5 | India | 1252140 | 240000 | 19.167186 |
6 | Mozambique | 25834 | 18000 | 69.675621 |
7 | Portugal | 10608 | 140 | 1.319759 |
8 | Russian Federation | 142834 | 17000 | 11.901928 |
9 | Sao Tome and Principe | 193 | 18 | 9.326425 |
10 | South Africa | 52776 | 25000 | 47.370017 |
11 | Timor-Leste | 1133 | 990 | 87.378641 |
That’s it! I’ve written all the code needed to answer the questions I had. Next I’ll write up the analysis into a succinct and stand-alone notebook that can be shared with friends, family and colleagues or the whole world. You’ll find that in the next section.