Skip to main content

About this free course

Download this course

Share this free course

Learn to code for data analysis
Learn to code for data analysis

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

1.3 Selecting a column

Now you have the data, let the analysis begin!

An image of free standing Roman columns standing against a blue sky
Figure 3

Let’s tackle the first part of the first question: ‘What are the total, smallest, largest and average number of deaths due to TB?’ Obtaining the total number will be done in two steps: first select the column with the TB deaths, then sum the values in that column.

Selecting a single column of a dataframe is done with an expression in the format: dataFrame['column name'].

In []:

data['TB deaths']

Out[]:

0 6900

1 4400

2 41000

3 67

4 1200

5 240000

6 18000

7 140

8 17000

9 18

10 25000

11 990

Name: TB deaths, dtype: int64

Strings are verbatim text, which means that the column name must be written exactly as given in the dataframe, which you saw after loading the data. The slightest deviation leads to a key error , which can be seen as a kind of name error. You can try out in the Week 2 exercise notebook what happens when misspelling the column name. The error message is horribly long. In such cases, just skip to the last line of the error message to see the type of error.

Put this learning into practice in Exercise 6.

Exercise 6 selecting a column

In your Exercise notebook 1, select the population column and store it in a variable, so that you can use it in later exercises.

Remember that to open the notebook you’ll need to launch Anaconda and then navigate to the notebook using Jupyter. Once it’s open, run all the code.

Next, you’ll learn about making calculations on a column.