Standard Deviation by Group in Python (2 Examples)
In this Python tutorial you’ll learn how to find the standard deviation by group.
Table of contents:
Let’s dive right into the examples…
Example Data & Software Libraries
To be able to use the functions of the pandas library, we first need to load pandas to Python:
import pandas as pd # Import pandas library
We also need to create some example data:
data = pd.DataFrame({'x1':[5, 1, 5, 2, 1, 2, 9, 6, 9, 4, 7, 8], # Create pandas DataFrame 'x2':range(51, 63), 'x3':range(10, 22), 'group1':['B', 'B', 'A', 'A', 'B', 'B', 'A', 'A', 'B', 'B', 'B', 'A'], 'group2':['a', 'a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b', 'b']}) print(data) # Print pandas DataFrame
Table 1 shows that the example pandas DataFrame is constructed of twelve rows and five columns. The columns x1, x2, and x3 contain float values and the variables group1 and group2 will be used as group and subgroup indicators.
Example 1: Standard Deviation by Group in pandas DataFrame
This example demonstrates how to calculate the standard deviation by group using one group indicator.
To accomplish this, we have to apply the groupby function to the column we want to use to group our data (i.e. group1):
print(data.groupby('group1').std()) # Get standard deviation by group # x1 x2 x3 # group1 # A 2.738613 3.563706 3.563706 # B 3.078342 3.903600 3.903600
The previous console output shows our result, i.e. a standard deviation value for each group and each column in our data set.
Please be aware that this result reflects the population standard deviation. If you want to calculate the sample standard deviation, you would have to specify the ddof argument within the std function to be equal to 1.
Example 2: Standard Deviation by Group & Subgroup in pandas DataFrame
This example explains how to use multiple group and subgroup indicators to calculate a standard deviation by group.
For this task, we have to specify a list of all group indicator variables within the groupby function:
print(data.groupby(['group1', 'group2']).std()) # Get standard deviation by multiple groups # x1 x2 x3 # group1 group2 # A a 2.121320 0.707107 0.707107 # b 1.527525 2.645751 2.645751 # B a 1.892969 2.380476 2.380476 # b 2.516611 1.000000 1.000000
As you can see, the execution of the previous code has printed a matrix of standard deviations corresponding to our subgroups.
Video & Further Resources
Would you like to know more about the calculation of the standard deviation by group? Then you could have a look at the following video that I have published on my YouTube channel. I show the examples of this tutorial in the video instruction.
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
Furthermore, you might have a look at the related tutorials on https://statisticsglobe.com/:
- pandas Library Tutorial in Python
- Standard Deviation in Python
- Standard Deviation of NumPy Array
- stdev & pstdev Functions of statistics Module
- Variance in Python
- Introduction to Python
To summarize: At this point you should have learned how to calculate the standard deviation by group in the Python programming language. Please tell me about it in the comments section, in case you have further comments or questions. Furthermore, please subscribe to my email newsletter to get updates on the newest articles.