Learn to code for data analysis
Learn to code for data analysis

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

Free course

Learn to code for data analysis

1.4 Filtering groups

Being able to group rows according to some criterion and then apply various operations to those groups is a very powerful technique.

However, there may be occasions when you only want to work with a subset of the groups that can be extracted from a single dataset based on a particular group property. For example, it might require that:

  • groups that contain a minimum number of rows, such as countries that engage in trade around a particular commodity with a minimum number of partner countries
  • groups for whom a summary statistic meets certain conditions (for example, the total value of exports for a particular commodity exceeds a particular threshold value, or whose minimum or maximum value are below a certain value)
  • a ranking of the groups based on a particular summary statistic, such as the total trade value, that returns only the top five or bottom three groups according to that ranking.

In the following example, where groups are selected based on group size, a filtering operation is applied to limit an original dataset so that it includes just those groups containing at least three rows, combining the rows from the selected groups back together again to produce the output dataset:

In pandas, groups can be filtered based on their group properties using the filter() method. Using the example dataframe again:

In []:

df

Out[]:

CommodityAmount
0A10
1A15
2A5
3A20
4B10
5B10
6B5
7C20
8C30

For example, the dataframe can be filtered to return just the rows from groups where there is a maximum number of rows in the group.

As a reference point, count how many rows are associated with each group.

In []:

grouped = df.groupby('Commodity')

grouped.aggregate(len)

Out[]:

Amount
Commodity
A4
B3
C2

The filter() method uses a function that returns a boolean ( True or False ) value to decide whether or not to filter through the rows associated with a particular group.

As with the apply() method, provide the filter() method with just a function name in order to pass each group to that function. For example, define a function that says whether or not a group contains three or fewer rows and use that as a basis for filtering the original dataset.

In []:

def groupsOfAtMostThreeRows(g):

return len(g) <= 3

grouped.filter(groupsOfAtMostThreeRows)

Out[]:

CommodityAmount
4B10
5B10
6B5
7C20
8C30

Alternatively, all the rows in a group can be filtered on an aggregate property of the group such as the sum total, or maximum, minimum or mean value, from one of the columns.

In []:

#Consider the following total amounts by group

grouped.aggregate(sum)

Out[]:

Amount
Commodity
A50
B25
C50

In []:

pivot_table(df,

index=['Commodity','Partner'],

values='Amount',

aggfunc=sum)

Out[]:

CommodityAmount
0A10
1A15
2A5
3A20
7C20
8C30

The ability to filter datasets based on group properties means that large datasets can more easily be limited to just those rows associated with groups of rows that are deemed to be relevant in some way.

Exercise 4 Filtering groups

Use the Exercise notebook 4 to practise filtering in Exercise 4.

As you complete the tasks, think about other questions you could ask of your data using the filter command.

LCDAB_1

Take your learning further371

Making the decision to study can be a big step, which is why you'll want a trusted University. The Open University has 50 years’ experience delivering flexible learning and 170,000 students are studying with us right now. Take a look at all Open University courses372.

If you are new to university level study, we offer two introductory routes to our qualifications. Find out Where to take your learning next?373 You could either choose to start with an Access courses374or an open box module, which allows you to count your previous learning towards an Open University qualification.

Not ready for University study then browse over 1000 free courses on OpenLearn375 and sign up to our newsletter376 to hear about new free courses as they are released.

Every year, thousands of students decide to study with The Open University. With over 120 qualifications, we’ve got the right course for you.

Request an Open University prospectus371