Learn to code for data analysis
Learn to code for data analysis

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

Free course

Learn to code for data analysis

1.3 Summary operations

Summary, or aggregation, operations are used to produce a single summary value or statistic, such as the group average, for each separate group.

Find the ‘total’ amount within each group using a summary operation:

To apply a summary operator to each group, such as a function to find the mean value of each group, and then automatically combine the results into a single output dataframe, pass the name of the function in to the aggregate() method. Note that pandas will try to use this operator to summarise each column in the grouped rows separately if there is more than one column that can be summarised. So for example, if there was a ‘Volume’ column, it would also return total volumes.

Let’s use again the example dataframe defined earlier:

In []:




Group the data by commodity type and then apply the sum operation and combine the results in an output dataframe. The grouping elements are used to create index values in the output dataframe.

In []:





In this case, the aggregate() method applies the sum summary operation to each group and then automatically combines the results. For a summary operation such as this, the resulting combined dataframe contains as many rows as there were groups created by the splitting .groupby() operation.

The slightly more general apply() method can also be substituted for the aggregate() method and will similarly take the rows associated with each group, apply a function to them, and return a combined result.

The apply() method can be really handy if you have defined a function of your own that you want to apply to just the rows associated with each group. Simply pass the name of the function to the apply() method and it will then call your function, once per group, on the sets of rows associated with each group.

For example, find the top two items by ‘Amount’ in each group:

In []:

def top2byAmount(g):

return g.sort_values('Amount', ascending=False).head(2)




The second index column containing the numbers 3, 1, 4 etc., contains the original index value of each row.

In Week 3 the apply() method was called on a column, to apply the given function to each cell. Here it was called on a grouped dataframe, to apply the given function to each group.

Exercise 3 Experimenting with split-apply-combine

Work through Exercise 3 in your Exercise notebook 4 to practise the summary operations.

As you complete the tasks, think about these questions:

  • For your dataset, which months saw the highest and lowest levels of trade activity? Did there appear to be any seasonal behaviour?
  • When graphically comparing total trade flows from the leading partner countries to the World total, did it look as if any partners particularly dominated that area of trade? If you have time, find news reports discussing why this should be the case.

Take your learning further

Making the decision to study can be a big step, which is why you'll want a trusted University. The Open University has 50 years’ experience delivering flexible learning and 170,000 students are studying with us right now. Take a look at all Open University courses.

If you are new to University-level study, we offer two introductory routes to our qualifications. You could either choose to start with an Access module, or a module which allows you to count your previous learning towards an Open University qualification. Read our guide on Where to take your learning next for more information.

Not ready for formal University study? Then browse over 1000 free courses on OpenLearn and sign up to our newsletter to hear about new free courses as they are released.

Every year, thousands of students decide to study with The Open University. With over 120 qualifications, we’ve got the right course for you.

Request an Open University prospectus371