Summary Statistics by Group of pandas DataFrame in Python (3 Examples)

In this Python tutorial you’ll learn how to calculate summary statistics by group for the columns of a pandas DataFrame.

Table of contents:

1) Example Data & Libraries

2) Example 1: Calculate Mean by Group for Each Column of pandas DataFrame

3) Example 2: Calculate Mean by Multiple Group & Subgroup Columns

4) Example 3: Calculate Multiple Descriptive Statistics by Group

5) Video, Further Resources & Summary

Let’s get started:

Example Data & Libraries

First, we need to load the pandas library:

import pandas as pd                                  # Import pandas library to Python

We’ll also have to construct a pandas DataFrame that we can use in the example syntax later on.

data = pd.DataFrame({'x1':[1, 7, 5, 3, 7, 2, 7, 9],  # Create pandas DataFrame
                     'x2':range(0, 8),
                     'group1':['A', 'B', 'B', 'A', 'C', 'C', 'B', 'A'],
                     'group2':['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b']})
print(data)                                          # Print pandas DataFrame

table 1 DataFrame summary statistics group pandas dataframe python

As you can see based on Table 1, our exemplifying data is a DataFrame having eight rows and four columns. The variable x1 and x2 are floats and the variables group1 and group2 are group and subgroup indicators.

Example 1: Calculate Mean by Group for Each Column of pandas DataFrame

The following code demonstrates how to calculate the average of each pandas DataFrame column by group.

For this task, we can use the groupby and mean functions as shown below:

print(data.groupby('group1').mean())                 # Get mean by group
#               x1        x2
# group1                    
# A       4.333333  3.333333
# B       6.333333  3.000000
# C       4.500000  4.500000

The previous output shows our result, i.e. the mean values for the variables x1 and x2 separately for each group in the column group1.

Example 2: Calculate Mean by Multiple Group & Subgroup Columns

This section explains how to compute the mean for each column based on two group columns.

For this, we have to specify the names of the group columns as a list within the groupby function:

print(data.groupby(['group1', 'group2']).mean())     # Get mean by subgroup
#                 x1   x2
# group1 group2          
# A      a       2.0  1.5
#        b       9.0  7.0
# B      a       6.0  1.5
#        b       7.0  6.0
# C      b       4.5  4.5

Example 3: Calculate Multiple Descriptive Statistics by Group

So far, we have only calculated the mean of each column. However, we can also calculate other descriptive summary statistics for our pandas DataFrame columns.

The following syntax explains how to use the describe function to calculate metrics such as the count, mean, min, max, and different quantiles by group.

Let’s do this:

print(data.groupby('group1').describe())             # Get descriptive stats by group
#           x1                                 ...   x2                      
#        count      mean       std  min   25%  ...  min   25%  50%   75%  max
# group1                                       ...                           
# A        3.0  4.333333  4.163332  1.0  2.00  ...  0.0  1.50  3.0  5.00  7.0
# B        3.0  6.333333  1.154701  5.0  6.00  ...  1.0  1.50  2.0  4.00  6.0
# C        2.0  4.500000  3.535534  2.0  3.25  ...  4.0  4.25  4.5  4.75  5.0

# [3 rows x 16 columns]

Video, Further Resources & Summary

If you need further explanations on the Python syntax of this tutorial, you could have a look at the following video on my YouTube channel. In the video, I explain the Python syntax of this tutorial.

Furthermore, you may read the related articles on my website. Some tutorials on related topics such as groups, counting, and dates are shown below:

In this post you have learned how to compute summary statistics by group for the columns of a pandas DataFrame in the Python programming language. Don’t hesitate to let me know in the comments below, in case you have any additional questions or comments.

Summary Statistics by Group of pandas DataFrame in Python (3 Examples)

Example Data & Libraries

Example 1: Calculate Mean by Group for Each Column of pandas DataFrame

Example 2: Calculate Mean by Multiple Group & Subgroup Columns

Example 3: Calculate Multiple Descriptive Statistics by Group

Video, Further Resources & Summary

Leave a Reply Cancel reply

Statistics Globe Newsletter

Related Tutorials

Get List of Column Names Grouped by Data Type in Python (Example)

Select Rows of pandas DataFrame by Condition in Python (4 Examples)