Skip to content
Skip to main content

About this free course

Download this course

Share this free course

Learn to code for data analysis
Learn to code for data analysis

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

1.2 Variables and assignments

With the choice of data and questions confirmed, the coding can begin.

An image of a lot of boxes stacked up
Figure 2

To introduce the basics of coding, I will show you a very simple approach, only suitable for the smallest of datasets. Please bear with me. In the second part of the week I will show you the proper approach. Read through this step and the next – you’re not expected to write code just yet. In Exercise 1, a bit further on in this week, you’ll be asked to start writing code.

Ok, let’s start. I want the computer to calculate the total number of deaths in 2013. For the computer to do that, it must first be told what is the number of deaths in each country in that year. I’ll start with my home country.

In []:

deathsInPortugal = 100

The ‘In[]’ line is Jupyter’s way of saying that what follows is code I typed in. And there it is: the first line of code! It is a command to the computer that could be translated to English as: ‘find in the attic an empty box, put the number 100 in the box, and write “deathsInPortugal” on the box’. (Aren’t you glad Python is more succinct than English?) In coding jargon, the attic is the computer’s memory, boxes are called variables (I’ll explain why shortly), what’s written on a box is the variable’s name , and storing a value in a variable is called an assignment.

By naming the boxes, I can later ask the computer to show the value in box thingamajig or take the values in boxes stuff and moreStuff and add them together.

To see what’s inside a box, I can just write the name of the box on a line of its own. Jupyter will write the variable’s value on the screen, preceded by ‘Out[]’, to clearly mark the output generated by the code. When you start to use the Jupyter notebooks, you will see numbers inside the square brackets, i.e. In[1], In[2], etc., to indicate in which order the various pieces of code are being executed. Here we have omitted the numbers to avoid confusion between what you see here and what you see in your notebook.

In []:

deathsInPortugal

Out[]:

100

Each assignment is written on a line of its own. The computer executes the assignments line by line, from top to bottom. Thus, the program would continue as follows:

In []:

deathsInPortugal = 100

deathsInAngola = 200

deathsInBrazil = 300

I don’t think I need to continue, you get the gist.

By the way, all numbers so far are fictitious. If I use real data, taken from the World Health Organization website, you’ll see a difference.

In []:

deathsInPortugal = 140

deathsInAngola = 6900

deathsInBrazil = 4400

deathsInPortugal

Out[]:

140

Notice what happened. When a value is assigned to an already existing variable, the value stored in that variable is unceremoniously chucked away and replaced by the new value. In the example, the second group of assignments replaced the values assigned by the first group and thus the current value of deathsInPortugal is 140 and no longer 100. That’s why the storage boxes are called variables: their content can vary over time.

To sum up, a variable is a named storage for values and an assignment takes the value on the right hand side of the equal sign (=) and stores it in the variable on the left-hand side.

In the next section, you will find out the importance of naming in Python.