Learn to code for data analysis
Learn to code for data analysis

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

Free course

Learn to code for data analysis

1.1 Creating the data

I won’t yet work with the full data. Instead I will create small tables, to better illustrate this week’s concepts and techniques.

Small tables make it easier to see what is going on and to create specific data combination and transformation scenarios that test the code.

There are many ways of creating tables in pandas. One of the simplest is to define the rows as a list, with the first element of the list being the first row, the second element being the second row, etc.

Each row of a table has multiple cells, one for each column. The obvious way is to represent each row as a list too, the first element of the list being the cell in the first column, the second element corresponding to the second column, etc. To sum up, the table is represented as a list of lists.

Here is a table of the 2013 GDP of some countries, in US dollars:

In []:

table = [

['UK', 2678454886796.7], # 1st row

['USA', 16768100000000.0], # 2nd row

['China', 9240270452047.0], # and so on...

['Brazil', 2245673032353.8],

['South Africa', 366057913367.1]

]

To create a dataframe, I use a pandas function appropriately called DataFrame() . I have to give it two arguments: the names of the columns and the data itself. The column names are given as a list of strings, the first string being the first column name, etc.

In []:

headings = ['Country', 'GDP (US$)']

gdp = DataFrame(columns=headings, data=table)

gdp

Out[]:

CountryGDP (US$)
0UK2.678455e+12
1USA1.676810e+13
2China9.240270e+12
3Brazil2.245673e+12
4South Africa3.660579e+11

Note that pandas shows large numbers in scientific notation, where, for example, 3e+12 means 3×10 12 , i.e. a 3 followed by 12 zeros.

I define a similar table for the life expectancy, based on the 2013 World Bank data.

In []:

headings = ['Country name', 'Life expectancy (years)']

table = [

['China', 75],

['Russia', 71],

['United States', 79],

['India', 66],

['United Kingdom', 81]

]

life = DataFrame(columns=headings, data=table)

life

Out[]:

Country nameLife expectancy (years)
0China75
1Russia71
2United States79
3India66
4United Kingdom81

To illustrate potential issues when combining multiple datasets, I’ve taken a different set of countries, with common countries in a different order. Moreover, to illustrate a non-numeric conversion, I’ve abbreviated country names in one table but not the other.

Exercise 1 Creating the data

Open the exercise notebook 3 and save it in the disk folder or upload it to the CoCalc project you created in Week 1. Then practise creating dataframes in Exercise 1.

If you’re using Anaconda, remember that to open the notebook you’ll need to navigate to it using Jupyter. Whether you’re using Anaconda or CoCalc, once the notebook is open, run the existing code before you start the exercise. When you’ve completed the exercise, save the notebook. If you need a quick reminder of how to use Jupyter, watch again the video in Week 1 Exercise 1 [Tip: hold Ctrl and click a link to open it in a new tab. (Hide tip)]

LCDAB_1

Take your learning further371

Making the decision to study can be a big step, which is why you'll want a trusted University. The Open University has 50 years’ experience delivering flexible learning and 170,000 students are studying with us right now. Take a look at all Open University courses372.

If you are new to university level study, we offer two introductory routes to our qualifications. Find out Where to take your learning next?373 You could either choose to start with an Access courses374or an open box module, which allows you to count your previous learning towards an Open University qualification.

Not ready for University study then browse over 1000 free courses on OpenLearn375 and sign up to our newsletter376 to hear about new free courses as they are released.

Every year, thousands of students decide to study with The Open University. With over 120 qualifications, we’ve got the right course for you.

Request an Open University prospectus371