Learn to code for data analysis
Learn to code for data analysis

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

Free course

Learn to code for data analysis

2.1 Scatterplots

Statistics can be misleading. A coefficient of zero only states there is no ranking relation between the indicators, but there might be some other relationship.

In the next example, the correlation between x and y is zero, but they are clearly related (y is the square of x).

In []:

table = [ [-2,4], [-1,1], [0,0], [1,1], [2,4] ]

data = DataFrame(columns=['x', 'y'], data=table)

(correlation, pValue) = spearmanr(data['x'], data['y'])

print('The correlation is', correlation)

data

Out[]:

The correlation is 0.0

xy
0-24
1-11
200
311
424

It’s therefore best to complement the quantitative analysis with a more qualitative view provided by a chart. In the case of correlations, scatterplots will do very nicely. Each country is a dot plotted at the x and y coordinates corresponding to the GDP and life expectancy values.

In []:

%matplotlib inline

gdpVsLife.plot(x=GDP, y=LIFE, kind='scatter', grid=True)

Out[]:

<matplotlib.axes._subplots.AxesSubplot at 0x10e2e6eb8>

Figure 4

This graph is not very useful. The GDP difference between the poorest and richest countries is so vast that the whole chart is squashed to fit all GDP values on the x-axis. It is best to use a logarithmic scale , where the axis values don’t increase by a constant interval (10, 20, 30, for example), but by a multiplicative factor (10, 100, 1000, 10000, etc.). The parameter logx has to be set to True to get a logarithmic scale on the x-axis. Moreover, let’s make the chart a bit wider, by using the figsize parameter you saw last week.

In []:

gdpVsLife.plot(x=GDP, y=LIFE, kind='scatter', grid=True,

logx=True, figsize = (10, 4))

Out[]:

>matplotlib.axes._subplots.AxesSubplot at 0x10e400588>

Figure 5

The major tick marks in the x-axis go from 10 2 (that’s a one followed by two zeros, hence 100) to 10 8 (that’s a one followed by eight zeros, hence 100,000,000) million pounds, with the minor ticks marking the numbers in between. For example, the eight minor ticks between 10 2 and 10 3 represent the values 200 (2 × 10 2 ), 300 (3 × 10 2 ), and so on until 900 (9 × 10 2 ). As a further example, the country with the lowest life expectancy is on the second minor tick to the right of 10 3 , which means its GDP is about 3 × 10 3 (three thousand) million pounds.

Countries with a GDP around 10 thousand (10 4 ) millions of pounds have a wide range of life expectancies, from under 50 to over 80, but the range tends to shrink both for poorer and for richer countries. Countries with the lowest life expectancy are neither the poorest nor the richest, but those with highest expectancy are among the richer countries.

Exercise 11 Scatterplots

Practise using Scatterplots in Exercise11 in the Exercise notebook 3.

LCDAB_1

Take your learning further371

Making the decision to study can be a big step, which is why you'll want a trusted University. The Open University has 50 years’ experience delivering flexible learning and 170,000 students are studying with us right now. Take a look at all Open University courses372.

If you are new to university level study, we offer two introductory routes to our qualifications. Find out Where to take your learning next?373 You could either choose to start with an Access courses374or an open box module, which allows you to count your previous learning towards an Open University qualification.

Not ready for University study then browse over 1000 free courses on OpenLearn375 and sign up to our newsletter376 to hear about new free courses as they are released.

Every year, thousands of students decide to study with The Open University. With over 120 qualifications, we’ve got the right course for you.

Request an Open University prospectus371