Skip to main content

About this free course

Download this course

Share this free course

Learn to code for data analysis
Learn to code for data analysis

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

Week 2: Having a go at it Part 2

1 Enter the pandas

As you probably realised, this way of coding is not practical for large scale data analysis.

An image of four giant panda cubs climbing a bamboo fence
Figure 1

Three lines of code were required for each country, to store the number of deaths, store the population, and calculate the death rate. With roughly 200 countries in the world, my trivial analysis would require 400 variables and typing almost 600 lines of code! Life’s too short to be spent that way.

Instead of using a separate variable for each datum, it is better to organise data as a table of rows and columns.

Table 1
CountryDeathsPopulation
Angola690021472
Brazil4400200362
Portugal14010608

In that way, instead of 400 variables, I only need one that stores the whole table. Instead of writing a mile long expression that adds 200 variables to obtain the total deaths, I’ll write a short expression that calculates the total of the ‘Deaths’ column, no matter how many countries (rows) there are.

To organise data into tables and do calculations on such tables, you and I will use the pandas module, which is included in Anaconda and CoCalc. A module is a package of various pieces of code that can be used individually. The pandas module provides very extensive and advanced data analysis capabilities to compliment Python. This course only scratches the surface of pandas.

I have to tell the computer that I’m going to use a module.

In []:

from pandas import *

That line of code is an import statement: from the pandas module, import everything. In plain English: load into memory all pieces of code that are in the pandas module, so that I can use any of them. In the above statement, the asterisk isn’t the multiplication operator but instead means ‘everything’.

Each weekly project in this course will start with this import statement, because all projects need the pandas module.

The words from and import are reserved words : they can’t be used as variable, function or module names. Otherwise you will get a syntax error.

In []:

from = 100

File "<ipython-input-23-6958f0ebc10d>", line 1

from = 100

^

SyntaxError: invalid syntax

Jupyter notebooks show reserved words in boldface font to make them easier to spot. If you see a boldface name in an assignment (as you will for the code above), you must choose a different name.

Exercise 5 pandas

a. 

A syntax error


b. 

A name error, reported as an import error


The correct answer is b.

b. 

The computer is expecting a name but there is no module with the name 'Pandas' in the Anaconda distribution. Remember that names are case-sensitive.


a. 

A name error


b. 

A syntax error


The correct answer is b.

b. 

The computer is expecting a reserved word and anything else will raise a syntax error.


a. 

A name error


b. 

A syntax error


The correct answer is b.

b. 

The statement cannot end with the reserved word 'import'; the computer is expecting an indication of what to import.