Standard Deviation by Group in Python (2 Examples)

 

In this Python tutorial you’ll learn how to find the standard deviation by group.

Table of contents:

Let’s dive right into the examples…

 

Example Data & Software Libraries

To be able to use the functions of the pandas library, we first need to load pandas to Python:

import pandas as pd                                              # Import pandas library

We also need to create some example data:

data = pd.DataFrame({'x1':[5, 1, 5, 2, 1, 2, 9, 6, 9, 4, 7, 8],  # Create pandas DataFrame
                     'x2':range(51, 63),
                     'x3':range(10, 22),
                     'group1':['B', 'B', 'A', 'A', 'B', 'B', 'A', 'A', 'B', 'B', 'B', 'A'],
                     'group2':['a', 'a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b', 'b']})
print(data)                                                      # Print pandas DataFrame

 

table 1 DataFrame standard deviation group python

 

Table 1 shows that the example pandas DataFrame is constructed of twelve rows and five columns. The columns x1, x2, and x3 contain float values and the variables group1 and group2 will be used as group and subgroup indicators.

 

Example 1: Standard Deviation by Group in pandas DataFrame

This example demonstrates how to calculate the standard deviation by group using one group indicator.

To accomplish this, we have to apply the groupby function to the column we want to use to group our data (i.e. group1):

print(data.groupby('group1').std())                  # Get standard deviation by group
#               x1        x2        x3
# group1                              
# A       2.738613  3.563706  3.563706
# B       3.078342  3.903600  3.903600

The previous console output shows our result, i.e. a standard deviation value for each group and each column in our data set.

Please be aware that this result reflects the population standard deviation. If you want to calculate the sample standard deviation, you would have to specify the ddof argument within the std function to be equal to 1.

 

Example 2: Standard Deviation by Group & Subgroup in pandas DataFrame

This example explains how to use multiple group and subgroup indicators to calculate a standard deviation by group.

For this task, we have to specify a list of all group indicator variables within the groupby function:

print(data.groupby(['group1', 'group2']).std())      # Get standard deviation by multiple groups
#                      x1        x2        x3
# group1 group2                              
# A      a       2.121320  0.707107  0.707107
#        b       1.527525  2.645751  2.645751
# B      a       1.892969  2.380476  2.380476
#        b       2.516611  1.000000  1.000000

As you can see, the execution of the previous code has printed a matrix of standard deviations corresponding to our subgroups.

 

Video & Further Resources

Would you like to know more about the calculation of the standard deviation by group? Then you could have a look at the following video that I have published on my YouTube channel. I show the examples of this tutorial in the video instruction.

 

The YouTube video will be added soon.

 

Furthermore, you might have a look at the related tutorials on https://statisticsglobe.com/:

 

To summarize: At this point you should have learned how to calculate the standard deviation by group in the Python programming language. Please tell me about it in the comments section, in case you have further comments or questions. Furthermore, please subscribe to my email newsletter to get updates on the newest articles.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published.

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top