1.3 Selecting a column
Now you have the data, let the analysis begin!
Let’s tackle the first part of the first question: ‘What are the total, smallest, largest and average number of deaths due to TB?’ Obtaining the total number will be done in two steps: first select the column with the TB deaths, then sum the values in that column.
Selecting a single column of a dataframe is done with an expression in the format: dataFrame['column name'].
In []:
data['TB deaths']
Out[]:
0 6900
1 4400
2 41000
3 67
4 1200
5 240000
6 18000
7 140
8 17000
9 18
10 25000
11 990
Name: TB deaths, dtype: int64
Strings are verbatim text, which means that the column name must be written exactly as given in the dataframe, which you saw after loading the data. The slightest deviation leads to a key error , which can be seen as a kind of name error. You can try out in the Week 2 exercise notebook what happens when misspelling the column name. The error message is horribly long. In such cases, just skip to the last line of the error message to see the type of error.
Put this learning into practice in Exercise 6.
Exercise 6 selecting a column
In your Exercise notebook 1, select the population column and store it in a variable, so that you can use it in later exercises.
Remember that to open the notebook you’ll need to launch Anaconda and then navigate to the notebook using Jupyter. Once it’s open, run all the code.
Next, you’ll learn about making calculations on a column.