How to Draw a plotly Boxplot in Python (Example)

This tutorial provides several examples of plotly boxplots using the Python programming language.

Kirby White Researcher Statistician Programmer

Note: This article was created in collaboration with Kirby White. Kirby is a Statistics Globe author, innovation consultant, data science instructor. His Ph.D. is in Industrial-Organizational Psychology. You can read more about Kirby here!

Overview

Boxplots are one of the most fundamental statistical charts. Boxplots (sometimes called box and whisker plots) are designed to understand the distribution and symmetry of numerical variables. For instance, we could use a boxplot to show the age distribution in a certain country. The box would show the median, 25th, and 75th percentiles, and some variations would visualize the min and max range or the outliers.

Boxplots display a wealth of information, but can appear complex and difficult to understand when you first encounter them! If you’re not familiar with the structure of boxplots yet, you may have a look here.

Modules and Example Data

If you have not already done so, install and load these packages:

from vega_datasets import data
import pandas as pd
import plotly.express as px

We’ll use the iris dataset for this example, which is included with the vega datasets. We’ll save this in a data frame called df.

df = pd.DataFrame(data.iris())
df
 
# 	sepalLength	sepalWidth	petalLength	petalWidth	species
# 0	5.1	3.5	1.4	0.2	setosa
# 1	4.9	3.0	1.4	0.2	setosa
# 2	4.7	3.2	1.3	0.2	setosa

Basic Boxplot

Let’s create a simple boxplot to see the distribution of sepal widths among in these flowers:

fig1 = px.box(
    data_frame = df
    ,y = 'sepalLength'
)
 
fig1.show()

A wonderful feature of the plotly library is the hover info. Try to hover your cursor over the graphic to see what the lines and boxes show in this plot.

This graph shows us the distribution for all the sepals measured in this sample, but it would be more helpful to create a separate box to compare the widths across different species of iris. We can do that by mapping the species variable to the x-axis:

fig2 = px.box(
    data_frame = df
    ,y = 'sepalLength'
    ,x = 'species'
)
 
fig2.show()

We can see some clear differences between the boxes in our graph! It appears that the virginica species tends to have the longest sepals, but also has a lot of variation. The dot below the virginica box indicates that this particular data points is likely an outlier (i.e., extremely high or low).

Adding Color

To aid in our comprehension, it can be helpful to use a different color for each species:

fig3 = px.box(
    data_frame = df
    ,y = 'sepalLength'
    ,x = 'species'
    ,color = 'species'
)
 
fig3.show()

Grouped Boxplot

You can sometimes have multiple values to plot within each group. Plotly prefers that your data be structured in a “long” format for this, so let’s create a second data frame called df_long:

#only keeping three fields from the original data
df_long = df[['species', 'sepalWidth', 'sepalLength']].set_index('species').stack().reset_index()
df_long.columns = ['species', 'attribute', 'value']
df_long
 
 
# species	attribute	value
# 0	setosa	sepalWidth	3.5
# 1	setosa	sepalLength	5.1
# 2	setosa	sepalWidth	3.0
# 3	setosa	sepalLength	4.9

Let’s see how we can display the width and length of the sepals with this data:

fig4 = px.box(
    data_frame = df_long
    ,y = 'value'
    ,x = 'species'
    ,color = 'attribute'
)
 
fig4.show()

Adding Detail

One critique of boxplots is that they over-summarize the data and may unintentionally mask some details in the underlying data. An easy trick to avoid this is to also include a scatterplot adjacent to each box. This shows much more detail by including each record in addition to the summary provided by the boxplot. Each dot’s position along the y-axis is accurate, while any variation along the x-axis is simply to avoid overlapping the data points. On its own, this type of plot is called a jitter plot or a strip plot.

fig5 = px.box(
    data_frame = df_long
    ,y = 'value'
    ,x = 'species'
    ,color = 'attribute'
    ,points='all'
)
 
fig5.show()

Notched boxplots

Occasionally, you may be interested in the confidence intervals around the median for each group. This is mostly used by researchers looking for statistically significant differences between groups, and should only be shown to a technical audience. Neverless, plotly makes it easy to include “notches” with each box:

fig6 = px.box(
    data_frame = df
    ,y = 'sepalLength'
    ,x = 'species'
    ,color = 'species'
    ,notched = True
)
 
fig6.show()

Other Customizations

Horizontal Orientation

If you wish to change the orientation so that the boxes run horizontally, you can flip the x and y arguments:

fig7 = px.box(
    data_frame = df
    ,x = 'sepalLength'
    ,y = 'species'
    ,color = 'species'
)
 
fig7.show()

Custom Colors

You can also specify the exact colors to use for each box by passing a dictionary of group:color pairs to the color_discrete_map argument. You can use the name of most colors, or specify a HEX and RGB code, as shown here:

fig8 = px.box(
    data_frame = df
    ,y = 'sepalLength'
    ,x = 'species'
    ,color = 'species'
    ,color_discrete_map={"setosa":"red", "versicolor":"#1d61cf", "virginica":"rgb(20, 150, 96)"}
)
 
fig8.show()

Changing the Box Order

Finally, you can specify which order to display the bars with the category_orders argument. This is a dictionary that specifies the name of the column as the key, paired with a list of groups in the order you want them shown:

fig9 = px.box(
    data_frame = df
    ,y = 'sepalLength'
    ,x = 'species'
    ,color = 'species'
    ,color_discrete_map={"setosa":"red", "versicolor":"#1d61cf", "virginica":"rgb(20, 150, 96)"}
    ,category_orders={"species":("versicolor", "virginica", "setosa")}
)
 
fig9.show()

Further Resources

You may have a look at these other articles for more detailed examples and videos of popular charts in plotly: