How to Draw a plotly Barplot in Python (Example)
This article contains several examples on how to create barplots using the plotly library in Python.
The table of contents looks as follows:
Note: This article was created in collaboration with Kirby White. Kirby is a Statistics Globe author, innovation consultant, data science instructor. His Ph.D. is in Industrial-Organizational Psychology. You can read more about Kirby here!
Overview
Barcharts are one of the most commonly used types of plots. They typically show how a numerical property varies across categorical groups.
For example, we could use a bargraph to show how average scores in a university lecture (the numeric value) differ between the USA, Germany, Italy, India, and Mexico (the categorical groups).
Usually, one separate bar is shown for each category and the length of the bars are proportional to their numeric values. Taller bars typically represent larger values.
Modules and Example Data
If you have not already done so, please install and load the packages below:
import pandas as pd import plotly.express as px
We’ll use the diamonds
dataset for this example, which can be easily downloaded directly from GitHub. We’ll download the full file (approx 50,000 rows) into df_raw
and then create a summary dataframe called df_groups
.
df_raw = pd.read_csv('https://github.com/tidyverse/ggplot2/raw/c9e6304272a2c76af2c4146f8ce97afd0e537684/data-raw/diamonds.csv') df_groups = df_raw.groupby('cut').mean().reset_index() df_groups # cut carat depth table price x y z #Fair 1.046137 64.041677 59.053789 4358.757764 6.246894 6.182652 3.982770 #Good 0.849185 62.365879 58.694639 3928.864452 5.838785 5.850744 3.639507 #Ideal 0.702837 61.709401 55.951668 3457.541970 5.507451 5.520080 3.401448 #etc...
Basic Barplot
As a first step, let’s create a basic barplot of the average price for each cut of diamond:
fig1 = px.bar( data_frame = df_groups, x = 'cut', y = 'price' ) fig1
As you can see in the graphic, premium diamonds tend to have the highest price! Surprising, given that “ideal” cuts are a higher grade.
Grouped Barplot
We may also compare each cut of a diamond on multiple values, and this is where grouped barplots are very useful. Grouped barcharts display multiple numeric values for each group on the same chart.
To create such a barplot, we need to specify a list of column names to our y
argument, and then specify barmode = 'group'
.
fig2 = px.bar( data_frame = df_groups, x = 'cut', y = ['table','depth'], barmode = 'group' ) fig2
Grouped barplots make it easy to compare the different values within the same cut, but not as easy to compare the same value across different cuts.
We might rearrange the order of the bars to optimize a different set of comparisons by grouping the same numeric values together, so it’s easier to compare different cuts.
This is the point where DataFrames that are organized into a “long” format are easier to work with. Let’s create such a long data object called df_long
:
df_long = df_groups.set_index('cut').stack().reset_index() df_long.columns = ['cut', 'metric', 'value'] df_long # cut metric value # Fair carat 1.046137 # Fair depth 64.041677 # Fair table 59.053789 # Fair price 4358.757764 # etc...
We’ll use this DataFrame in our next chart, but will still only chart the depth and table variables.
fig3 = px.bar( data_frame = df_long[(df_long.metric == 'depth') | (df_long.metric == 'table')], x = 'metric', y = 'value', color = 'cut', barmode = 'group' ) fig3
Stacked Barplot
A popular alternative to grouped barcharts are so-called stacked barplots. This is most often done to compare cumulative values of multiple subgroups within each group, such as cut and clarity.
In this case, we’ll create and use the df_count
DataFrame to see how often each clarity grade is processed into each grade of cut.
#count the diamonds within each clarity/cut combination df_count = pd.DataFrame(df_raw.groupby(['clarity','cut']).size().reset_index()) df_count.columns = ['clarity','cut','count'] #calculate percentage within each clarity group df_count['percent'] = (df_count['count']/df_count.groupby('clarity')['count'].transform('sum'))*100 df_count # clarity cut count percent # I1 Fair 210 28.340081 # I1 Good 96 12.955466 # I1 Ideal 146 19.703104 # I1 Premium 205 27.665317 # I1 Very Good 84 11.336032 # IF Fair 9 0.502793 # IF Good 71 3.966480 # IF Ideal 1212 67.709497 #etc...
To stack the bars, we can specify our attributes and set barmode = 'stack'
:
fig4 = px.bar( data_frame = df_count, x = 'clarity', y = 'count', color = 'cut', barmode = 'stack' ) fig4
This creates stacked bars, however, the length of each bar is still proportional to the size of each group.
We may build a “100% stacked” barchart to more easily see these comparisons, and to better compare the proportions across groups.
We can achieve this by plotting the 'percent'
column we created earlier:
fig5 = px.bar( data_frame = df_count, x = 'clarity', y = 'percent', color = 'cut', barmode = 'stack' ) fig5
Other Modifications
We might rotate our bars so that the bars are shown horizontally instead of vertically by changing the fields mapped to x
and y
.
In the plotly package, the x
parameter always describes the information mapped to the horizontal axis, while the y
parameter maps to the vertical axis.
fig6 = px.bar( data_frame = df_groups, y = 'cut', x = 'price', barmode = 'group' ) fig6
Finally, you may overlay the actual value of each bar by mapping the value to the text_auto
argument. The code used here adds a dollar sign to the front of the value and then rounds it to 2 decimal places:
fig7 = px.bar( data_frame = df_groups, x = 'cut', y = 'price', barmode = 'group', text_auto='$.2f' ) fig7
Further Resources
You can check out these other articles for more detailed examples and videos of these popular charts in plotly:
- plotly Boxplot in Python
- plotly Histogram in Python
- plotly Line Plot in Python
- plotly Scatterplot in Python
- Introduction to plotly in Python
- Introduction: Python Programming Language
Statistics Globe Newsletter