Skip to content
Skip to main content

About this free course

Download this course

Share this free course

Learn to code for data analysis
Learn to code for data analysis

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

1.4 Calculations on a column

Having selected the column with the number of deaths per country, I’ll add them with the appropriately named sum() method to obtain the overall total deaths.

A method is a function that can only be called in a certain context. In this course, the context will mostly be a dataframe or a column. A method call looks like a function call, but adds the context in which to call the method: context.methodName(argument1, argument2, ...) . In other words, a dataframe method can only be called on dataframes, a column method only on columns. Because methods are functions, a method call returns a value and is therefore an expression.

If all that sounded too abstract, here’s how to call the sum() method on the TB deaths column. Note that sum() doesn’t need any arguments because all the values are in the column.

In []:

tbColumn = data['TB deaths']

tbColumn.sum()

Out[]:

354715

The estimated total number of deaths due to TB in 2013 in the BRICS and Portuguese-speaking countries was over 350 thousand. An impressive number, for the wrong reasons.

Calculating the minimum and maximum number of deaths is done in a similar way.

In []:

tbColumn.min()

Out[]:

18

In []:

tbColumn.max()

Out[]:

240000

Like sum() , the column methods min() and max() don’t need arguments, whereas the Python functions min() and max() did need them, because there was no context (column) providing the values.

The average number is computed as before, dividing the total by the number of countries.

In []:

tbColumn.sum() / 12

Out[]:

29559.583333333332

This kind of average is called the mean and there’s a method for that.

In []:

tbColumn.mean()

Out[]:

29559.583333333332

Another kind of average measure is the median , which is the number in the middle, i.e. half of the values are above the median and half below it.

In []:

tbColumn.median()

Out[]:

5650.0

The mean is five times higher than the median. While half the countries had less than 5650 deaths in 2013, some countries had far more, which pushes the mean up.

The median is probably closer to the intuition you have of what ‘average’ should mean (pun intended). News reports don’t always make clear what average measure is being used, and using the mean may distort reality. For example, the mean household income in a country will be influenced by very poor and very rich households, whereas the median income doesn’t take into account how poor or rich the extremes are: it will always be half the households below and half above the median.

Put this learning into practice in Exercise 7.

Exercise 7 calculations on a column

Practise the use of column methods by applying them to the population column you obtained in Exercise 6 in the Exercise notebook 1. Remember to run all code before doing the exercise.