Learn to code for data analysis
Learn to code for data analysis

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

Free course

Learn to code for data analysis

1.4 Calculations on a column

Having selected the column with the number of deaths per country, I’ll add them with the appropriately named sum() method to obtain the overall total deaths.

A method is a function that can only be called in a certain context. In this course, the context will mostly be a dataframe or a column. A method call looks like a function call, but adds the context in which to call the method: context.methodName(argument1, argument2, ...) . In other words, a dataframe method can only be called on dataframes, a column method only on columns. Because methods are functions, a method call returns a value and is therefore an expression.

If all that sounded too abstract, here’s how to call the sum() method on the TB deaths column. Note that sum() doesn’t need any arguments because all the values are in the column.

In []:

tbColumn = data['TB deaths']

tbColumn.sum()

Out[]:

354715

The estimated total number of deaths due to TB in 2013 in the BRICS and Portuguese-speaking countries was over 350 thousand. An impressive number, for the wrong reasons.

Calculating the minimum and maximum number of deaths is done in a similar way.

In []:

tbColumn.min()

Out[]:

18

In []:

tbColumn.max()

Out[]:

240000

Like sum() , the column methods min() and max() don’t need arguments, whereas the Python functions min() and max() did need them, because there was no context (column) providing the values.

The average number is computed as before, dividing the total by the number of countries.

In []:

tbColumn.sum() / 12

Out[]:

29559.583333333332

This kind of average is called the mean and there’s a method for that.

In []:

tbColumn.mean()

Out[]:

29559.583333333332

Another kind of average measure is the median , which is the number in the middle, i.e. half of the values are above the median and half below it.

In []:

tbColumn.median()

Out[]:

5650.0

The mean is five times higher than the median. While half the countries had less than 5650 deaths in 2013, some countries had far more, which pushes the mean up.

The median is probably closer to the intuition you have of what ‘average’ should mean (pun intended). News reports don’t always make clear what average measure is being used, and using the mean may distort reality. For example, the mean household income in a country will be influenced by very poor and very rich households, whereas the median income doesn’t take into account how poor or rich the extremes are: it will always be half the households below and half above the median.

Put this learning into practice in Exercise 7.

Exercise 7 calculations on a column

Practise the use of column methods by applying them to the population column you obtained in Exercise 6 in the Exercise notebook 1. Remember to run all code before doing the exercise.

LCDAB_1

Take your learning further

Making the decision to study can be a big step, which is why you'll want a trusted University. The Open University has 50 years’ experience delivering flexible learning and 170,000 students are studying with us right now. Take a look at all Open University courses.

If you are new to University-level study, we offer two introductory routes to our qualifications. You could either choose to start with an Access module, or a module which allows you to count your previous learning towards an Open University qualification. Read our guide on Where to take your learning next for more information.

Not ready for formal University study? Then browse over 1000 free courses on OpenLearn and sign up to our newsletter to hear about new free courses as they are released.

Every year, thousands of students decide to study with The Open University. With over 120 qualifications, we’ve got the right course for you.

Request an Open University prospectus371