1 Enter the pandas

As you probably realised, this way of coding is not practical for large scale data analysis.

An image of four giant panda cubs climbing a bamboo fence
Figure 1

Three lines of code were required for each country, to store the number of deaths, store the population, and calculate the death rate. With roughly 200 countries in the world, my trivial analysis would require 400 variables and typing almost 600 lines of code! Life’s too short to be spent that way.

Instead of using a separate variable for each datum, it is better to organise data as a table of rows and columns.

Table 1

CountryDeathsPopulation
Angola690021472
Brazil4400200362
Portugal14010608

In that way, instead of 400 variables, I only need one that stores the whole table. Instead of writing a mile long expression that adds 200 variables to obtain the total deaths, I’ll write a short expression that calculates the total of the ‘Deaths’ column, no matter how many countries (rows) there are.

To organise data into tables and do calculations on such tables, you and I will use the pandas module, which is included in Anaconda and CoCalc. A module is a package of various pieces of code that can be used individually. The pandas module provides very extensive and advanced data analysis capabilities to compliment Python. This course only scratches the surface of pandas.

I have to tell the computer that I’m going to use a module.

from pandas import *

That line of code is an import statement: from the pandas module, import everything. In plain English: load into memory all pieces of code that are in the pandas module, so that I can use any of them. In the above statement, the asterisk isn’t the multiplication operator but instead means ‘everything’.

Each weekly project in this course will start with this import statement, because all projects need the pandas module.

The words from and import are reserved words : they can’t be used as variable, function or module names. Otherwise you will get a syntax error.

from = 100

File "<ipython-input-23-6958f0ebc10d>", line 1

from = 100

^

SyntaxError: invalid syntax

Jupyter notebooks show reserved words in boldface font to make them easier to spot. If you see a boldface name in an assignment (as you will for the code above), you must choose a different name.

Exercise 5 pandas

Use Exercise 5 the Exercise notebook 1 to help you answer these questions about errors you might come across.

1. What kind of error will you get if you misspell 'pandas' as 'Pandas'?

 

2. What kind of error will you get if you misspell 'import' as 'impart'?

 

3. What kind of error will you get if you forget the asterisk?