1.2 Looking at apply and combine operations
Having split a dataset by grouping, an operation is ‘applied’ to each group.
The operation often takes one of two forms:
- a ‘summary’ operation, in which a summary statistic based on the rows contained within each group is generated. A single value is returned for each group, for example, the group median or mean, the number of rows in the group, or the maximum or minimum value in the group. The final result will have M rows, one for each of the M groups created by the split (that is, . groupby() ) operation.
- a ‘filtering’ or ‘filtration’ operation, in which groups of rows are retained or discarded based on a particular property of the group as a whole. For example, only groups of rows where the sum of all the values in the group is above some threshold are retained. The effect is that each group keeps the same number of rows, but the resulting dataset (after combination, see below) may contain fewer groups than the original.
The results of applying the summary or filtration operation are then combined to provide a single output dataframe.
In the next section, you will see how to apply a variety of summary operations, and in a later step examples of filtration operations.