1.4 Applying functions
Having coded the three data conversion functions, they can be applied to the GDP table.
I first select the relevant column:
In []:
column = gdp['Country']
column
Out[]:
0 UK
1 USA
2 China
3 Brazil
4 South Africa
Name: Country, dtype: object
Next, I use the column method apply() , which applies a given function to each cell in the column, returning a new column, in which each cell is the conversion of the corresponding original cell:
In []:
column.apply(expandCountry)
Out[]:
0 United Kingdom
1 United States
2 China
3 Brazil
4 South Africa
Name: Country, dtype: object
Finally, I add that new column to the dataframe, using a new column heading:
In [] :
gdp['Country name'] = column.apply(expandCountry)
gdp
Out[]:
Country | GDP (US$) | Country name | |
---|---|---|---|
0 | UK | 2.678455e+12 | United Kingdom |
1 | USA | 1.676810e+13 | United States |
2 | China | 9.240270e+12 | China |
3 | Brazil | 2.245673e+12 | Brazil |
4 | South Africa | 3.660579e+11 | South Africa |
In a similar way, I can convert the US dollars to British pounds, then round to the nearest million, and store the result in a new column. I could apply the conversion and rounding functions in two separate statements, but using method chaining , I can apply both functions in a single line of code. This is possible because the column returned by the first call of apply() is the context for the second call of apply() . Here’s how it’s written:
In []:
column = gdp['GDP (US$)']
result = column.apply(usdToGbp).apply(roundToMillions)
gdp['GDP (£m)'] = result
gdp
Out[]:
Country | GDP (US$) | Country name | GDP (£m) | |
---|---|---|---|---|
0 | UK | 2.678455e+12 | United Kingdom | 1711727 |
1 | USA | 1.676810e+13 | United States | 10716029 |
2 | China | 9.240270e+12 | China | 5905202 |
3 | Brazil | 2.245673e+12 | Brazil | 1435148 |
4 | South Africa | 3.660579e+11 | South Africa | 233937 |
Now it’s just a matter of selecting the two new columns, as the original ones are no longer needed.
In []:
headings = ['Country name', 'GDP (£m)']
gdp = gdp[headings]
gdp
Out[]:
Country name | GDP (£m) | |
---|---|---|
0 | United Kingdom | 1711727 |
1 | United States | 10716029 |
2 | China | 5905202 |
3 | Brazil | 1435148 |
4 | South Africa | 233937 |
Note that method chaining only works if the methods chained return the same type of value as their context, in the same way that you can chain multiple arithmetic operators (e.g. 3+4-5) because each one takes two numbers and returns a number that is used by the next operator in the chain. In this course, methods only have two possible contexts, columns and dataframes, so you can either chain column methods that return a single column (that is a Series ), like apply() , or dataframe methods that return dataframes. For example, gdp.head(4).tail(2) is a dataframe just with China and Brazil, i.e. the last two of the first four rows of the dataframe shown above. You’ll see further examples of chaining (and an easier way to select multiple rows) later this week.
This concludes the data transformation part. After applying functions in the next exercise, you’ll learn how to combine two tables.
Exercise 4 Applying functions
You can practise applying functions in Exercise 4 of your Exercise notebook 3.