1.4 Applying functions
Having coded the three data conversion functions, they can be applied to the GDP table.
I first select the relevant column:
column = gdp['Country']
column
0 UK
1 USA
2 China
3 Brazil
4 South Africa
Name: Country, dtype: object
Next, I use the column method , which applies a given function to each cell in the column, returning a new column, in which each cell is the conversion of the corresponding original cell:
column.apply(expandCountry)
0 United Kingdom
1 United States
2 China
3 Brazil
4 South Africa
Name: Country, dtype: object
Finally, I add that new column to the dataframe, using a new column heading:
:
gdp['Country name'] = column.apply(expandCountry)
gdp
| Country | GDP (US$) | Country name | |
|---|---|---|---|
| 0 | UK | 2.678455e+12 | United Kingdom |
| 1 | USA | 1.676810e+13 | United States |
| 2 | China | 9.240270e+12 | China |
| 3 | Brazil | 2.245673e+12 | Brazil |
| 4 | South Africa | 3.660579e+11 | South Africa |
In a similar way, I can convert the US dollars to British pounds, then round to the nearest million, and store the result in a new column. I could apply the conversion and rounding functions in two separate statements, but using method chaining , I can apply both functions in a single line of code. This is possible because the column returned by the first call of is the context for the second call of . Here’s how it’s written:
column = gdp['GDP (US$)']
result = column.apply(usdToGbp).apply(roundToMillions)
gdp['GDP (£m)'] = result
gdp
| Country | GDP (US$) | Country name | GDP (£m) | |
|---|---|---|---|---|
| 0 | UK | 2.678455e+12 | United Kingdom | 1711727 |
| 1 | USA | 1.676810e+13 | United States | 10716029 |
| 2 | China | 9.240270e+12 | China | 5905202 |
| 3 | Brazil | 2.245673e+12 | Brazil | 1435148 |
| 4 | South Africa | 3.660579e+11 | South Africa | 233937 |
Now it’s just a matter of selecting the two new columns, as the original ones are no longer needed.
headings = ['Country name', 'GDP (£m)']
gdp = gdp[headings]
gdp
| Country name | GDP (£m) | |
|---|---|---|
| 0 | United Kingdom | 1711727 |
| 1 | United States | 10716029 |
| 2 | China | 5905202 |
| 3 | Brazil | 1435148 |
| 4 | South Africa | 233937 |
Note that method chaining only works if the methods chained return the same type of value as their context, in the same way that you can chain multiple arithmetic operators (e.g. 3+4-5) because each one takes two numbers and returns a number that is used by the next operator in the chain. In this course, methods only have two possible contexts, columns and dataframes, so you can either chain column methods that return a single column (that is a ), like , or dataframe methods that return dataframes. For example, is a dataframe just with China and Brazil, i.e. the last two of the first four rows of the dataframe shown above. You’ll see further examples of chaining (and an easier way to select multiple rows) later this week.
This concludes the data transformation part. After applying functions in the next exercise, you’ll learn how to combine two tables.
Exercise 4 Applying functions
You can practise applying functions in Exercise 4 of your Exercise notebook 3.
OpenLearn - Introduction and guidance
Except for third party materials and otherwise, this content is made available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 Licence, full copyright detail can be found in the acknowledgements section. Please see full copyright statement for details.
