Group by Two & Multiple Columns of pandas DataFrame in Python (2 Examples)

 

On this page you’ll learn how to group a pandas DataFrame by two or more columns in the Python programming language.

The tutorial is structured as follows:

So now the part you have been waiting for – the examples.

 

Example Data & Libraries

First, we need to import the pandas library:

import pandas as pd                                              # Import pandas library in Python

Furthermore, have a look at the following example data:

data = pd.DataFrame({'x1':[6, 1, 3, 2, 5, 5, 1, 9, 7, 2, 3, 9],  # Create pandas DataFrame
                     'x2':range(7, 19),
                     'group1':['A', 'B', 'B', 'A', 'C', 'A', 'A', 'C', 'B', 'C', 'B', 'A'],
                     'group2':['a', 'a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b', 'b'],
                     'group3':['x', 'y', 'z', 'x', 'y', 'z', 'x', 'y', 'z', 'x', 'y', 'z']})
print(data)                                                      # Print pandas DataFrame

 

table 1 DataFrame group two multiple columns pandas dataframe python

 

Table 1 shows the structure of our example DataFrame – It contains twelve rows and five columns. The variables group1, group2, and group3 will be used as group indicators in the following examples.

 

Example 1: GroupBy pandas DataFrame Based On Two Group Columns

Example 1 shows how to group the values in a pandas DataFrame based on two group columns.

To accomplish this, we can use the groupby function as shown in the following Python codes.

The syntax below returns the mean values by group using the variables group1 and group2 as group indicators.

print(data.groupby(['group1', 'group2']).mean())                 # Get mean by two groups
#                      x1         x2
# group1 group2                     
# A      a       4.333333   9.666667
#        b       5.000000  15.500000
# B      a       2.000000   8.500000
#        b       5.000000  16.000000
# C      a       5.000000  11.000000
#        b       5.500000  15.000000

Similar to that, we can calculate other descriptive statistics for the value columns by group such as the maximum values…

print(data.groupby(['group1', 'group2']).max())                  # Get maxima by two groups
#                x1  x2 group3
# group1 group2               
# A      a        6  12      z
#        b        9  18      z
# B      a        3   9      z
#        b        7  17      z
# C      a        5  11      y
#        b        9  16      y

…or the addition of all values by group:

print(data.groupby(['group1', 'group2']).sum())                  # Get sum by two groups
#                x1  x2
# group1 group2        
# A      a       13  29
#        b       10  31
# B      a        4  17
#        b       10  32
# C      a        5  11
#        b       11  30

 

Example 2: GroupBy pandas DataFrame Based On Multiple Group Columns

In Example 1, we have created groups and subgroups using two group columns.

Example 2 demonstrates how to use more than two (i.e. three) variables to group our data set.

For this, we simply have to specify another column name within the groupby function.

The following Python codes calculate the mean by three group columns…

print(data.groupby(['group1', 'group2', 'group3']).mean())       # Get mean by three groups
#                        x1    x2
# group1 group2 group3           
# A      a      x       4.0   8.5
#               z       5.0  12.0
#        b      x       1.0  13.0
#               z       9.0  18.0
# B      a      y       1.0   8.0
#               z       3.0   9.0
#        b      y       3.0  17.0
#               z       7.0  15.0
# C      a      y       5.0  11.0
#        b      x       2.0  16.0
#               y       9.0  14.0

…the maxima by group…

print(data.groupby(['group1', 'group2', 'group3']).max())        # Get maxima by three groups
#                       x1  x2
# group1 group2 group3        
# A      a      x        6  10
#               z        5  12
#        b      x        1  13
#               z        9  18
# B      a      y        1   8
#               z        3   9
#        b      y        3  17
#               z        7  15
# C      a      y        5  11
#        b      x        2  16
#               y        9  14

…and the sum by group:

print(data.groupby(['group1', 'group2', 'group3']).sum())        # Get sum by three groups
#                       x1  x2
# group1 group2 group3        
# A      a      x        8  17
#               z        5  12
#        b      x        1  13
#               z        9  18
# B      a      y        1   8
#               z        3   9
#        b      y        3  17
#               z        7  15
# C      a      y        5  11
#        b      x        2  16
#               y        9  14

 

Video & Further Resources

Do you need more info on the examples of this tutorial? Then I can recommend watching the following video instruction which I have published on the Statistics Globe YouTube channel. I’m explaining the Python codes of this article in the video:

 

The YouTube video will be added soon.

 

Furthermore, you may read some of the related posts on my homepage.

 

To summarize: In this article you have learned how to group the values in a pandas DataFrame by two or more columns in the Python programming language. Please let me know in the comments, in case you have any additional questions or comments. Furthermore, please subscribe to my email newsletter for updates on new tutorials.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published.

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top