# How to Draw a plotly Barplot in Python (Example)

This article contains several examples on how to create barplots using the plotly library in Python.

Note: This article was created in collaboration with Kirby White. Kirby is a Statistics Globe author, innovation consultant, data science instructor. His Ph.D. is in Industrial-Organizational Psychology. You can read more about Kirby here!

## Overview

Barcharts are one of the most commonly used types of plots. They typically show how a numerical property varies across categorical groups.

For example, we could use a bargraph to show how average scores in a university lecture (the numeric value) differ between the USA, Germany, Italy, India, and Mexico (the categorical groups).

Usually, one separate bar is shown for each category and the length of the bars are proportional to their numeric values. Taller bars typically represent larger values.

## Modules and Example Data

import pandas as pd
import plotly.express as px

We’ll use the diamonds dataset for this example, which can be easily downloaded directly from GitHub. We’ll download the full file (approx 50,000 rows) into df_rawand then create a summary dataframe called df_groups.

df_raw = pd.read_csv('https://github.com/tidyverse/ggplot2/raw/c9e6304272a2c76af2c4146f8ce97afd0e537684/data-raw/diamonds.csv')
df_groups = df_raw.groupby('cut').mean().reset_index()
df_groups

# cut	  carat 	    depth	    table	    price	      x         y	        z
#Fair	  1.046137  	64.041677	59.053789	4358.757764	6.246894	6.182652	3.982770
#Good	  0.849185	  62.365879	58.694639	3928.864452	5.838785	5.850744	3.639507
#Ideal  0.702837	  61.709401	55.951668	3457.541970	5.507451	5.520080	3.401448
#etc...

## Basic Barplot

As a first step, let’s create a basic barplot of the average price for each cut of diamond:

fig1 = px.bar(
data_frame = df_groups,
x = 'cut',
y = 'price'
)
fig1

As you can see in the graphic, premium diamonds tend to have the highest price! Surprising, given that “ideal” cuts are a higher grade.

## Grouped Barplot

We may also compare each cut of a diamond on multiple values, and this is where grouped barplots are very useful. Grouped barcharts display multiple numeric values for each group on the same chart.

To create such a barplot, we need to specify a list of column names to our y argument, and then specify barmode = 'group'.

fig2 = px.bar(
data_frame = df_groups,
x = 'cut',
y = ['table','depth'],
barmode = 'group'
)
fig2

Grouped barplots make it easy to compare the different values within the same cut, but not as easy to compare the same value across different cuts.

We might rearrange the order of the bars to optimize a different set of comparisons by grouping the same numeric values together, so it’s easier to compare different cuts.

This is the point where DataFrames that are organized into a “long” format are easier to work with. Let’s create such a long data object called df_long:

df_long = df_groups.set_index('cut').stack().reset_index()
df_long.columns = ['cut', 'metric', 'value']
df_long

# cut   metric  value
# Fair  carat   1.046137
# Fair  depth   64.041677
# Fair  table   59.053789
# Fair  price   4358.757764
# etc...

We’ll use this DataFrame in our next chart, but will still only chart the depth and table variables.

fig3 = px.bar(
data_frame = df_long[(df_long.metric == 'depth') | (df_long.metric == 'table')],
x = 'metric',
y = 'value',
color = 'cut',
barmode = 'group'
)
fig3

## Stacked Barplot

A popular alternative to grouped barcharts are so-called stacked barplots. This is most often done to compare cumulative values of multiple subgroups within each group, such as cut and clarity.

In this case, we’ll create and use the df_count DataFrame to see how often each clarity grade is processed into each grade of cut.

#count the diamonds within each clarity/cut combination
df_count = pd.DataFrame(df_raw.groupby(['clarity','cut']).size().reset_index())
df_count.columns = ['clarity','cut','count']

#calculate percentage within each clarity group
df_count['percent'] = (df_count['count']/df_count.groupby('clarity')['count'].transform('sum'))*100

df_count

#	clarity	cut	      count	percent
#	I1	    Fair	    210	  28.340081
#	I1	    Good	    96	  12.955466
#	I1	    Ideal	    146	  19.703104
#	I1	    Very Good	84	  11.336032
#	IF	    Fair	    9	    0.502793
#	IF	    Good	    71	  3.966480
#	IF	    Ideal	    1212	67.709497
#etc...

To stack the bars, we can specify our attributes and set barmode = 'stack':

fig4 = px.bar(
data_frame = df_count,
x = 'clarity',
y = 'count',
color = 'cut',
barmode = 'stack'
)

fig4

This creates stacked bars, however, the length of each bar is still proportional to the size of each group.

We may build a “100% stacked” barchart to more easily see these comparisons, and to better compare the proportions across groups.

We can achieve this by plotting the 'percent' column we created earlier:

fig5 = px.bar(
data_frame = df_count,
x = 'clarity',
y = 'percent',
color = 'cut',
barmode = 'stack'
)

fig5

## Other Modifications

We might rotate our bars so that the bars are shown horizontally instead of vertically by changing the fields mapped to x and y.

In the plotly package, the x parameter always describes the information mapped to the horizontal axis, while the y parameter maps to the vertical axis.

fig6 = px.bar(
data_frame = df_groups,
y = 'cut',
x = 'price',
barmode = 'group'
)

fig6

Finally, you may overlay the actual value of each bar by mapping the value to the text_auto argument. The code used here adds a dollar sign to the front of the value and then rounds it to 2 decimal places:

fig7 = px.bar(
data_frame = df_groups,
x = 'cut',
y = 'price',
barmode = 'group',
text_auto='\$.2f'
)

fig7

## Further Resources

You can check out these other articles for more detailed examples and videos of these popular charts in plotly:

Subscribe to the Statistics Globe Newsletter