{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercise notebook 1: Having a go at it\n", "\n", "This Jupyter notebook, for the first part of The Open University's _Learn to Code for Data Analysis_ course, contains code examples and coding activities for you.\n", "\n", "You'll come across steps in the course directing you to this notebook. Once you've done each exercise, go back to the corresponding step and mark it as complete. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# this code conceals irrelevant warning messages\n", "import warnings\n", "warnings.simplefilter('ignore', FutureWarning)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 1: variables and assignments\n", "\n", "A **variable** is a named storage for values. An **assignment** takes a value (like the number 100 below) and stores it in a variable (`deathsInPortugal` below)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "deathsInPortugal = 100" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To display the value stored in a variable, write the name of the variable. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "100" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "deathsInPortugal" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each variable can store one value at any time, but the value stored can vary over time, by assigning a new value to the variable." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "140" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "deathsInPortugal = 100\n", "deathsInPortugal = 140\n", "deathsInPortugal" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each assignment is written on a separate line. The computer executes the assignments one line at a time, from top to bottom." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "deathsInPortugal = 140\n", "deathsInAngola = 6900\n", "deathsInBrazil = 4400" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task\n", "\n", "Add assignments to the code cell above (or in a new code cell) for the estimated deaths by TB in 2013 in the remaining BRICS countries. The values are as follows: Russia 17000, India 240000, China 41000, South Africa 25000. \n", "\n", "Don't forget to run the code cell, so that the new variables are available for the exercises further below.\n", "\n", "**Now go back to the course step and mark it complete.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 2: expressions\n", "\n", "An **expression** is a fragment of code that has a value. A variable name, by itself, is an expression: the expression's value is the value stored in the variable. In Jupyter notebooks, if the last line of a code cell is an expression, then the computer will show its value when executing the cell. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "140" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "deathsInPortugal" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By contrast, a **statement** is a command for the computer to do something. Commands don't produce values, and therefore the computer doesn't display anything." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "deathsInPortugal = 140" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "More complex expressions can be written using the arithmetic **operators** of addition (`+`), substraction (`-`), multiplication (`*`) and division (`/`). For example, the total number of deaths in the three countries is:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "11440" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "deathsInAngola + deathsInBrazil + deathsInPortugal" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the calculated value needs to be used later on in the code, it has to be stored in a variable. In general, the right-hand side of an assignment is an expression; its value is calculated (the expression is **evaluated**) and stored in the variable." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "11440" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "totalDeaths = deathsInAngola + deathsInBrazil + deathsInPortugal\n", "totalDeaths" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The average number of deaths is the total divided by the number of countries." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3813.3333333333335" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "totalDeaths / 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The average could also be calculated with a single expression." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3813.3333333333335" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(deathsInAngola + deathsInBrazil + deathsInPortugal) / 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The parentheses (round brackets) are necessary to state that the sum has to be calculated before the division. Without parentheses, Python follows the conventional order used in mathematics: divisions and multiplications are done before additions and subtractions." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "11346.666666666666" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "deathsInAngola + deathsInBrazil + deathsInPortugal / 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task\n", "\n", "- In the cell below, write code to calculate the total number of deaths in the five BRICS countries (Brazil, Russia, India, China, South Africa) in 2013. Run the code to see the result, which should be 327400." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- In the cell below, write code to calculate the average number of deaths in the BRICS countries in 2013. Run the code to see the result, which should be 65480." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Now go back to the course step and mark it complete.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 3: functions quiz \n", "\n", "A **function** takes zero or more values (the function's **arguments**) and **returns** (produces) a value. To **call** (use) a function, write the function name, followed by its arguments within parentheses (round brackets). Multiple arguments are separated by commas. Function names follow the same rules and conventions as variable names. A function call is an expression: the expression's value is the value returned by the function.\n", "\n", "Python provides two functions to compute the **maximum** (largest) and **minimum** (smallest) of two or more values." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4400" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "max(deathsInBrazil, deathsInPortugal)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "140" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "min(deathsInAngola, deathsInBrazil, deathsInPortugal)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The **range** of a set of values is the difference between the maximum and the minimum." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4260" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "largest = max(deathsInBrazil, deathsInPortugal)\n", "smallest = min(deathsInBrazil, deathsInPortugal)\n", "deathsRange = largest - smallest\n", "deathsRange" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tasks\n", "\n", "Answer the quiz questions in the course. All of them can be answered by editing the above code cell. Don't forget that you can use TAB-completion to quickly write the variable names of the remaining BRICS countries, namely Russia, India, China and South Africa (Brazil is already in the code above)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 4: comments\n", "\n", "**Comments** start with the hash sign (`#`) and go until the end of the line. They're used to annotate the code, e.g. to indicate the units of values." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.3197586726998491" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# population unit: thousands of inhabitants\n", "populationOfPortugal = 10608\n", "\n", "# deaths unit: inhabitants\n", "deathsInPortugal = 140\n", "\n", "# deaths per 100 thousand inhabitants\n", "deathsInPortugal * 100 / populationOfPortugal" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task\n", "\n", "Calculate the deaths per 100 thousand inhabitants for Brazil. Its population in 2013 was roughly 200 million and 362 thousand people. You should obtain a result of around 2.2 deaths per 100 thousand people." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Now go back to the course step and mark it complete.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 5: pandas quiz\n", "\n", "All programs in this course must start with the following **import statement**, to load all the code from the pandas **module**." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from pandas import *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The words in boldface (`from` and `import`) are **reserved words** of the Python language; they cannot be used as names.\n", "\n", "### Task\n", "\n", "Answer the quiz questions in the course. You can change the above line of code to find out the answers." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 6: selecting a column\n", "\n", "The `read_excel()` function takes a **string** with the name of an Excel file, and returns a **dataframe**, the pandas representation of a table. The computer reports a **file not found** error if the file is not in the same folder as this notebook, or the file name is misspelt." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryPopulation (1000s)TB deaths
0Angola214726900
1Brazil2003624400
2China139333741000
3Equatorial Guinea75767
4Guinea-Bissau17041200
5India1252140240000
6Mozambique2583418000
7Portugal10608140
8Russian Federation14283417000
9Sao Tome and Principe19318
10South Africa5277625000
11Timor-Leste1133990
\n", "
" ], "text/plain": [ " Country Population (1000s) TB deaths\n", "0 Angola 21472 6900\n", "1 Brazil 200362 4400\n", "2 China 1393337 41000\n", "3 Equatorial Guinea 757 67\n", "4 Guinea-Bissau 1704 1200\n", "5 India 1252140 240000\n", "6 Mozambique 25834 18000\n", "7 Portugal 10608 140\n", "8 Russian Federation 142834 17000\n", "9 Sao Tome and Principe 193 18\n", "10 South Africa 52776 25000\n", "11 Timor-Leste 1133 990" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = read_excel('WHO POP TB some.xls')\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The expression `dataFrame[columnName]` evaluates to the column with the given name (a string). Column names are case sensitive. Misspelling the column name will result in a rather long **key error** message. You can see what happens by changing the string in the next code cell (e.g. replace `TB` by `tb`) and running it. Don't forget to undo your change and run the code again." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 6900\n", "1 4400\n", "2 41000\n", "3 67\n", "4 1200\n", "5 240000\n", "6 18000\n", "7 140\n", "8 17000\n", "9 18\n", "10 25000\n", "11 990\n", "Name: TB deaths, dtype: int64" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tbColumn = data['TB deaths']\n", "tbColumn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task\n", "\n", "In the next cell, select the population column and store it in a variable (you'll use it in the next exercise). You need to scroll back to the start of the exercise to see the column's name." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Now go back to the course step and mark it complete.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 7: calculations on a column\n", "\n", "\n", "A **method** is a function that can only be called in a certain context, like a dataframe or a column. A **method call** is of the form `context.methodName(argument1, argument2, ...)`. \n", "\n", "Pandas provides several column methods, including to calculate the sum, the largest, and the smallest of the numbers in a column, as follows." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "354715" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tbColumn.sum()" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "240000" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tbColumn.max()" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "18" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tbColumn.min()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The **mean** of a collection of numbers is the sum of those numbers divided by how many there are." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "29559.583333333332" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tbColumn.sum() / 12" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "29559.583333333332" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tbColumn.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The **median** of a collection of numbers is the number in the middle, i.e. half of the numbers are below the median and half are above." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5650.0" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tbColumn.median()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tasks\n", "\n", "Use the population column variable from the previous exercise to calculate:\n", "\n", "- the total population" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- the maximum population" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- the minimum population" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Now go back to the course step and mark it complete.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 8: sorting on a column\n", "\n", "The dataframe method `sort_values()` takes as argument a column name and returns a new dataframe, with rows in ascending order according to the values in that column." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryPopulation (1000s)TB deaths
9Sao Tome and Principe19318
3Equatorial Guinea75767
7Portugal10608140
11Timor-Leste1133990
4Guinea-Bissau17041200
1Brazil2003624400
0Angola214726900
8Russian Federation14283417000
6Mozambique2583418000
10South Africa5277625000
2China139333741000
5India1252140240000
\n", "
" ], "text/plain": [ " Country Population (1000s) TB deaths\n", "9 Sao Tome and Principe 193 18\n", "3 Equatorial Guinea 757 67\n", "7 Portugal 10608 140\n", "11 Timor-Leste 1133 990\n", "4 Guinea-Bissau 1704 1200\n", "1 Brazil 200362 4400\n", "0 Angola 21472 6900\n", "8 Russian Federation 142834 17000\n", "6 Mozambique 25834 18000\n", "10 South Africa 52776 25000\n", "2 China 1393337 41000\n", "5 India 1252140 240000" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.sort_values('TB deaths')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sorting doesn't change the original table." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryPopulation (1000s)TB deaths
0Angola214726900
1Brazil2003624400
2China139333741000
3Equatorial Guinea75767
4Guinea-Bissau17041200
5India1252140240000
6Mozambique2583418000
7Portugal10608140
8Russian Federation14283417000
9Sao Tome and Principe19318
10South Africa5277625000
11Timor-Leste1133990
\n", "
" ], "text/plain": [ " Country Population (1000s) TB deaths\n", "0 Angola 21472 6900\n", "1 Brazil 200362 4400\n", "2 China 1393337 41000\n", "3 Equatorial Guinea 757 67\n", "4 Guinea-Bissau 1704 1200\n", "5 India 1252140 240000\n", "6 Mozambique 25834 18000\n", "7 Portugal 10608 140\n", "8 Russian Federation 142834 17000\n", "9 Sao Tome and Principe 193 18\n", "10 South Africa 52776 25000\n", "11 Timor-Leste 1133 990" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data # rows still in original order" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sorting on a column that has text will put the rows in alphabetical order." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryPopulation (1000s)TB deaths
0Angola214726900
1Brazil2003624400
2China139333741000
3Equatorial Guinea75767
4Guinea-Bissau17041200
5India1252140240000
6Mozambique2583418000
7Portugal10608140
8Russian Federation14283417000
9Sao Tome and Principe19318
10South Africa5277625000
11Timor-Leste1133990
\n", "
" ], "text/plain": [ " Country Population (1000s) TB deaths\n", "0 Angola 21472 6900\n", "1 Brazil 200362 4400\n", "2 China 1393337 41000\n", "3 Equatorial Guinea 757 67\n", "4 Guinea-Bissau 1704 1200\n", "5 India 1252140 240000\n", "6 Mozambique 25834 18000\n", "7 Portugal 10608 140\n", "8 Russian Federation 142834 17000\n", "9 Sao Tome and Principe 193 18\n", "10 South Africa 52776 25000\n", "11 Timor-Leste 1133 990" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.sort_values('Country')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task\n", "\n", "Sort the same table by population, to quickly see which are the least and the most populous countries." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Now go back to the course step and mark it complete.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Final quiz: Calculations over columns\n", "\n", "This information will help you to answer questions in the final quiz.\n", "\n", "The value of an arithmetic expression involving columns is a column. In evaluating the expression, the computer computes the expression for each row." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 32.134873\n", "1 2.196025\n", "2 2.942576\n", "3 8.850727\n", "4 70.422535\n", "5 19.167186\n", "6 69.675621\n", "7 1.319759\n", "8 11.901928\n", "9 9.326425\n", "10 47.370017\n", "11 87.378641\n", "dtype: float64" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "deathsColumn = data['TB deaths']\n", "populationColumn = data['Population (1000s)']\n", "rateColumn = deathsColumn * 100 / populationColumn\n", "rateColumn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To add a new column to a dataframe, 'select' a non-existing column, i.e. with a new name, and assign to it." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryPopulation (1000s)TB deathsTB deaths (per 100000)
0Angola21472690032.134873
1Brazil20036244002.196025
2China1393337410002.942576
3Equatorial Guinea757678.850727
4Guinea-Bissau1704120070.422535
5India125214024000019.167186
6Mozambique258341800069.675621
7Portugal106081401.319759
8Russian Federation1428341700011.901928
9Sao Tome and Principe193189.326425
10South Africa527762500047.370017
11Timor-Leste113399087.378641
\n", "
" ], "text/plain": [ " Country Population (1000s) TB deaths \\\n", "0 Angola 21472 6900 \n", "1 Brazil 200362 4400 \n", "2 China 1393337 41000 \n", "3 Equatorial Guinea 757 67 \n", "4 Guinea-Bissau 1704 1200 \n", "5 India 1252140 240000 \n", "6 Mozambique 25834 18000 \n", "7 Portugal 10608 140 \n", "8 Russian Federation 142834 17000 \n", "9 Sao Tome and Principe 193 18 \n", "10 South Africa 52776 25000 \n", "11 Timor-Leste 1133 990 \n", "\n", " TB deaths (per 100000) \n", "0 32.134873 \n", "1 2.196025 \n", "2 2.942576 \n", "3 8.850727 \n", "4 70.422535 \n", "5 19.167186 \n", "6 69.675621 \n", "7 1.319759 \n", "8 11.901928 \n", "9 9.326425 \n", "10 47.370017 \n", "11 87.378641 " ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data['TB deaths (per 100000)'] = rateColumn\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tasks\n", "\n", "Add code to calculate:\n", "\n", "- the range of the population, in thousands of inhabitants" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- the mean of the death rate" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- the median of the death rate" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now you can answer the questions in the **final quiz**." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 1 }