{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Project 1: Deaths by tuberculosis\n", "\n", "by Michel Wermelinger, 14 July 2015, edited 5 April 2016, updated 18 October and 20 December 2017 \n", "\n", "This is the project notebook for the first part of The Open University's _Learn to code for Data Analysis_ course.\n", "\n", "In 2000, the United Nations set eight Millenium Development Goals (MDGs) to reduce poverty and diseases, improve gender equality and environmental sustainability, etc. Each goal is quantified and time-bound, to be achieved by the end of 2015. Goal 6 is to have halted and started reversing the spread of HIV, malaria and tuberculosis (TB).\n", "TB doesn't make headlines like Ebola, SARS (severe acute respiratory syndrome) and other epidemics, but is far deadlier. For more information, see the World Health Organisation (WHO) page .\n", "\n", "Given the population and number of deaths due to TB in some countries during one year, the following questions will be answered: \n", "\n", "- What is the total, maximum, minimum and average number of deaths in that year?\n", "- Which countries have the most and the least deaths?\n", "- What is the death rate (deaths per 100,000 inhabitants) for each country?\n", "- Which countries have the lowest and highest death rate?\n", "\n", "The death rate allows for a better comparison of countries with widely different population sizes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The data\n", "\n", "The data consists of total population and total number of deaths due to TB (excluding HIV) in 2013 in each of the BRICS (Brazil, Russia, India, China, South Africa) and Portuguese-speaking countries. \n", "\n", "The data was taken in July 2015 from (population) and (deaths). The uncertainty bounds of the number of deaths were ignored.\n", "\n", "The data was collected into an Excel file which should be in the same folder as this notebook." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryPopulation (1000s)TB deaths
0Angola214726900
1Brazil2003624400
2China139333741000
3Equatorial Guinea75767
4Guinea-Bissau17041200
5India1252140240000
6Mozambique2583418000
7Portugal10608140
8Russian Federation14283417000
9Sao Tome and Principe19318
10South Africa5277625000
11Timor-Leste1133990
\n", "
" ], "text/plain": [ " Country Population (1000s) TB deaths\n", "0 Angola 21472 6900\n", "1 Brazil 200362 4400\n", "2 China 1393337 41000\n", "3 Equatorial Guinea 757 67\n", "4 Guinea-Bissau 1704 1200\n", "5 India 1252140 240000\n", "6 Mozambique 25834 18000\n", "7 Portugal 10608 140\n", "8 Russian Federation 142834 17000\n", "9 Sao Tome and Principe 193 18\n", "10 South Africa 52776 25000\n", "11 Timor-Leste 1133 990" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import warnings\n", "warnings.simplefilter('ignore', FutureWarning)\n", "\n", "from pandas import *\n", "data = read_excel('WHO POP TB some.xls')\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The range of the problem\n", "\n", "The column of interest is the last one." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "tbColumn = data['TB deaths']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The total number of deaths in 2013 is:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "354715" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tbColumn.sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The largest and smallest number of deaths in a single country are:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "240000" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tbColumn.max()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "18" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tbColumn.min()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From less than 20 to almost a quarter of a million deaths is a huge range. The average number of deaths, over all countries in the data, can give a better idea of the seriousness of the problem in each country.\n", "The average can be computed as the mean or the median. Given the wide range of deaths, the median is probably a more sensible average measure." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "29559.583333333332" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tbColumn.mean()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5650.0" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tbColumn.median()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The median is far lower than the mean. This indicates that some of the countries had a very high number of TB deaths in 2013, pushing the value of the mean up." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The most affected\n", "\n", "To see the most affected countries, the table is sorted in ascending order by the last column, which puts those countries in the last rows." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryPopulation (1000s)TB deaths
9Sao Tome and Principe19318
3Equatorial Guinea75767
7Portugal10608140
11Timor-Leste1133990
4Guinea-Bissau17041200
1Brazil2003624400
0Angola214726900
8Russian Federation14283417000
6Mozambique2583418000
10South Africa5277625000
2China139333741000
5India1252140240000
\n", "
" ], "text/plain": [ " Country Population (1000s) TB deaths\n", "9 Sao Tome and Principe 193 18\n", "3 Equatorial Guinea 757 67\n", "7 Portugal 10608 140\n", "11 Timor-Leste 1133 990\n", "4 Guinea-Bissau 1704 1200\n", "1 Brazil 200362 4400\n", "0 Angola 21472 6900\n", "8 Russian Federation 142834 17000\n", "6 Mozambique 25834 18000\n", "10 South Africa 52776 25000\n", "2 China 1393337 41000\n", "5 India 1252140 240000" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.sort_values('TB deaths')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The table raises the possibility that a large number of deaths may be partly due to a large population. To compare the countries on an equal footing, the death rate per 100,000 inhabitants is computed." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryPopulation (1000s)TB deathsTB deaths (per 100,000)
0Angola21472690032.134873
1Brazil20036244002.196025
2China1393337410002.942576
3Equatorial Guinea757678.850727
4Guinea-Bissau1704120070.422535
5India125214024000019.167186
6Mozambique258341800069.675621
7Portugal106081401.319759
8Russian Federation1428341700011.901928
9Sao Tome and Principe193189.326425
10South Africa527762500047.370017
11Timor-Leste113399087.378641
\n", "
" ], "text/plain": [ " Country Population (1000s) TB deaths \\\n", "0 Angola 21472 6900 \n", "1 Brazil 200362 4400 \n", "2 China 1393337 41000 \n", "3 Equatorial Guinea 757 67 \n", "4 Guinea-Bissau 1704 1200 \n", "5 India 1252140 240000 \n", "6 Mozambique 25834 18000 \n", "7 Portugal 10608 140 \n", "8 Russian Federation 142834 17000 \n", "9 Sao Tome and Principe 193 18 \n", "10 South Africa 52776 25000 \n", "11 Timor-Leste 1133 990 \n", "\n", " TB deaths (per 100,000) \n", "0 32.134873 \n", "1 2.196025 \n", "2 2.942576 \n", "3 8.850727 \n", "4 70.422535 \n", "5 19.167186 \n", "6 69.675621 \n", "7 1.319759 \n", "8 11.901928 \n", "9 9.326425 \n", "10 47.370017 \n", "11 87.378641 " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "populationColumn = data['Population (1000s)']\n", "data['TB deaths (per 100,000)'] = tbColumn * 100 / populationColumn\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusions\n", "\n", "The BRICS and Portuguese-speaking countries had a total of about 350 thousand deaths due to TB in 2013. The median shows that half of these coutries had fewer than 5,650 deaths. The much higher mean (over 29,000) indicates that some countries had a very high number. The least affected were Sao Tome and Principe and Equatorial Guinea, with 18 and 67 deaths respectively, and the most affected were China and India with 41 thousand and 240 thousand deaths in a single year. However, taking the population size into account, the least affected were Portugal and Brazil with less than 2.2 deaths per 100 thousand inhabitants, and the most affected were Guinea-Bissau and East Timor with over 70 deaths per 100,000 inhabitants.\n", "\n", "One should not forget that most values are estimates, and that the chosen countries are a small sample of all the world's countries. Nevertheless, they convey the message that TB is still a major cause of fatalities, and that there is a huge disparity between countries, with several ones being highly affected." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 1 }