# GroupBy pandas DataFrame in Python (2 Examples)

In this tutorial youâ€™ll learn how to **aggregate a pandas DataFrame by a group column** in Python.

Table of contents:

Hereâ€™s how to do itâ€¦

## Example Data & Software Libraries

To be able to use the functions of the pandas library, we first need to import pandas to Python:

import pandas as pd # Import pandas library

The data below will be used as a basis for this Python programming tutorial:

data = pd.DataFrame({'x1':[6, 5, 3, 2, 5, 8, 9, 7, 2, 8], # Create pandas DataFrame 'x2':range(9, 19), 'group1':['A', 'B', 'A', 'A', 'C', 'C', 'A', 'C', 'B', 'A'], 'group2':['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b']}) print(data) # Print pandas DataFrame

Table 1 shows the structure of our example pandas DataFrame: It has ten rows and four columns. Two of these columns contain integers (i.e. x1 and x2), and two of these columns will be used to group our data set (i.e. group1 and group2).

## Example 1: GroupBy pandas DataFrame Based On One Group Column

In this example, Iâ€™ll demonstrate how to calculate certain summary statistics for a pandas DataFrame by group based on one grouping column.

For this task, we can use the groupby function. The following Python code returns the mean by groupâ€¦

print(data.groupby('group1').mean()) # Get mean by group # x1 x2 # group1 # A 5.600000 13.000000 # B 3.500000 13.500000 # C 6.666667 14.333333

â€¦the Python syntax below finds the sum by groupâ€¦

print(data.groupby('group1').sum()) # Get sum by group # x1 x2 # group1 # A 28 65 # B 7 27 # C 20 43

â€¦and the following syntax computes the population variance by group:

print(data.groupby('group1').var()) # Get variance by group # x1 x2 # group1 # A 9.300000 12.500000 # B 4.500000 24.500000 # C 2.333333 2.333333

## Example 2: GroupBy pandas DataFrame Based On Multiple Group Columns

In this example, Iâ€™ll demonstrate how to apply the groupby function to two different group variables simultaneously.

To accomplish this, we have to specify a list of group indicators within the groupby function.

Below, you can find the syntax to calculate the men by multiple groupsâ€¦

print(data.groupby(['group1', 'group2']).mean()) # Get mean by multiple groups # x1 x2 # group1 group2 # A a 3.666667 10.666667 # b 8.500000 16.500000 # B a 5.000000 10.000000 # b 2.000000 17.000000 # C a 5.000000 13.000000 # b 7.500000 15.000000

â€¦the sum by two groupsâ€¦

print(data.groupby(['group1', 'group2']).sum()) # Get sum by multiple groups # x1 x2 # group1 group2 # A a 11 32 # b 17 33 # B a 5 10 # b 2 17 # C a 5 13 # b 15 30

â€¦and the variance by multiple groups:

print(data.groupby(['group1', 'group2']).var()) # Get variance by multiple groups # x1 x2 # group1 group2 # A a 4.333333 2.333333 # b 0.500000 4.500000 # B a NaN NaN # b NaN NaN # C a NaN NaN # b 0.500000 2.000000

## Video, Further Resources & Summary

If you need further info on the Python codes of this tutorial, I recommend watching the following video on my YouTube channel. In the video, I demonstrate the topics of this article:

*The YouTube video will be added soon.*

In addition, you could have a look at the related tutorials that I have published on my website.

- Max & Min by Group in Python
- Standard Deviation by Group in Python
- Calculate Mean by Group in Python
- Calculate Sum by Group in Python
- Slice pandas DataFrame by Index in Python in R
- Rename Columns of pandas DataFrame in Python
- Create Subset of Columns of pandas DataFrame in Python
- Rename Column of pandas DataFrame by Index in Python
- How to Use the pandas Library in Python
- Python Programming Examples

In summary: In this article, I have demonstrated how to **aggregate the values of a pandas DataFrame by a group indicator** in the Python programming language. In case you have further questions, please tell me about it in the comments section below.