Summary Statistics by Group of pandas DataFrame in Python (3 Examples)
In this Python tutorial you’ll learn how to calculate summary statistics by group for the columns of a pandas DataFrame.
Table of contents:
Let’s get started:
Example Data & Libraries
First, we need to load the pandas library:
import pandas as pd # Import pandas library to Python
We’ll also have to construct a pandas DataFrame that we can use in the example syntax later on.
data = pd.DataFrame({'x1':[1, 7, 5, 3, 7, 2, 7, 9], # Create pandas DataFrame 'x2':range(0, 8), 'group1':['A', 'B', 'B', 'A', 'C', 'C', 'B', 'A'], 'group2':['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b']}) print(data) # Print pandas DataFrame
As you can see based on Table 1, our exemplifying data is a DataFrame having eight rows and four columns. The variable x1 and x2 are floats and the variables group1 and group2 are group and subgroup indicators.
Example 1: Calculate Mean by Group for Each Column of pandas DataFrame
The following code demonstrates how to calculate the average of each pandas DataFrame column by group.
For this task, we can use the groupby and mean functions as shown below:
print(data.groupby('group1').mean()) # Get mean by group # x1 x2 # group1 # A 4.333333 3.333333 # B 6.333333 3.000000 # C 4.500000 4.500000
The previous output shows our result, i.e. the mean values for the variables x1 and x2 separately for each group in the column group1.
Example 2: Calculate Mean by Multiple Group & Subgroup Columns
This section explains how to compute the mean for each column based on two group columns.
For this, we have to specify the names of the group columns as a list within the groupby function:
print(data.groupby(['group1', 'group2']).mean()) # Get mean by subgroup # x1 x2 # group1 group2 # A a 2.0 1.5 # b 9.0 7.0 # B a 6.0 1.5 # b 7.0 6.0 # C b 4.5 4.5
Example 3: Calculate Multiple Descriptive Statistics by Group
So far, we have only calculated the mean of each column. However, we can also calculate other descriptive summary statistics for our pandas DataFrame columns.
The following syntax explains how to use the describe function to calculate metrics such as the count, mean, min, max, and different quantiles by group.
Let’s do this:
print(data.groupby('group1').describe()) # Get descriptive stats by group # x1 ... x2 # count mean std min 25% ... min 25% 50% 75% max # group1 ... # A 3.0 4.333333 4.163332 1.0 2.00 ... 0.0 1.50 3.0 5.00 7.0 # B 3.0 6.333333 1.154701 5.0 6.00 ... 1.0 1.50 2.0 4.00 6.0 # C 2.0 4.500000 3.535534 2.0 3.25 ... 4.0 4.25 4.5 4.75 5.0
# [3 rows x 16 columns]
Video, Further Resources & Summary
If you need further explanations on the Python syntax of this tutorial, you could have a look at the following video on my YouTube channel. In the video, I explain the Python syntax of this tutorial.
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
Furthermore, you may read the related articles on my website. Some tutorials on related topics such as groups, counting, and dates are shown below:
- pandas Library Tutorial in Python
- Summary Statistics of pandas DataFrame
- Insert Row at Specific Position of pandas DataFrame in Python
- Sort pandas DataFrame by Date in Python in R
- Check if Column Exists in pandas DataFrame in Python
- Sort pandas DataFrame by Column in Python
- Count Unique Values by Group in Column of pandas DataFrame in Python
- All Python Programming Examples
In this post you have learned how to compute summary statistics by group for the columns of a pandas DataFrame in the Python programming language. Don’t hesitate to let me know in the comments below, in case you have any additional questions or comments.