How to Draw a plotly Boxplot in R (Example)
This article provides several examples of boxplots using the plotly package in the R programming language.
Note: This article was created in collaboration with Kirby White. Kirby is an organizational effectiveness consultant and researcher, who is currently pursuing a Ph.D. at the Seattle Pacific University. You can read more about Kirby here!
Overview
Boxplots (sometimes called “box and whisker” plots) are a fundamental type of statistical chart. They are designed to display understand the distribution and symmetry of numeric data. For instance, we could use a boxplot to show the prices of recent real estate sales. The median, 25th, and 75th percentiles would be clearly indicated on the box, and some variations would show the min/max range or outliers.
Boxplots display a wealth of information, but can appear complex and intimidating when you first encounter them! If you’re new to boxplots, they are worth reading about in more detail, here.
If you have not already done so, install and load the plotly package using this code: install.packages("plotly")
and library(plotly)
. Some functions and the data used in this tutorial also come from the tidyverse, which you can install with install.packages("tidyverse")
.
Example Data
We’ll use the chickwts
dataset for this example, which is preloaded in R. This data came from nutrition research on baby chickens and contains two columns. weight
is the weight of a chick, and feed
is the type of food that chick was given.
We can store the data in a data frame with this code:
df <- datasets::chickwts
Basic Boxplot
Let’s create a simple boxplot of the weight of all chicks in the experiment:
plot_ly( data = df, y = ~weight, type = "box" )
One of the great features of plotly is the hover info. Go ahead and hover your cursor over the plot to see what the lines and boxes represent in this plot.
This graph shows us the weight distribution for all the chicks, but it would be more helpful to create a separate box for each type of feed that was used. We can do that by mapping the feed
variable to the x-axis:
plot_ly( data = df, y = ~weight, x = ~feed, type = "box" )
Now we can see some clear differences between groups! It appears that the casein group has the highest median weight, but also has a lot of variation. The meatmeal group has the widest range of weight and the linseed group seems most symmetrically distributed around its median value. The dots above and below the sunflower group indicate that those data points are outliers (i.e., extremely high or low).
Color and Boxpoints
There are several modifications to increase the aesthetic appeal or details displayed in a boxplot.
To make the box for each feed a different color, you can add a color
argument that is mapped to the same variable:
plot_ly( data = df, y = ~weight, x = ~feed, type = "box", color = ~feed, showlegend = FALSE )
One criticism of the box plot is that it over-simplifies the data summary and can display a misleading distribution. An approach to remedy this issue is to plot each data point as a marker (instead of just the outliers).
You can add this to your boxplots with a simple command:
plot_ly( data = df, y = ~weight, x = ~feed, type = "box", boxpoints = "all" )
You can now see each record as a dot on the graph alongside the boxes. This lets you use the boxes for a quick summary and the markers for a more detailed view. Adding the points increases the visual clutter on the graph, so it may not always be helpful.
Mean Lines, Notches and Orientation
The solid line bisecting each box shows you the median value for the group. It can sometimes be useful to also plot the mean value as a dashed line, as the distance between the median and mean can quickly indicate outliers or a skewed distribution.
Here’s an example:
plot_ly( data = df, y = ~weight, x = ~feed, type = "box", boxmean = TRUE )
This makes it even more clear that the linseed group is symmetrically distributed while the casein group is skewed downwards.
For even more statistical information, some people prefer creating a “notched” box to indicate the confidence interval of the median value. Confidence intervals are one way statisticians recognize and quantify the inherent uncertainty in their data, and allows them to build models with appropriate margins of error.
You can notch your boxes with a simple command:
plot_ly( data = df, y = ~weight, x = ~feed, type = "box", notched = TRUE )
Finally, if you prefer them horizontally, you can change the orientation of your boxplots by switching the x
and y
variables:
plot_ly( data = df, x = ~weight, y = ~feed, type = "box" )
Video, Further Resources & Summary
Check out this video for a tutorial of building these boxplots in plotly:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
You can check out these other articles for more detailed examples of these popular charts in plotly:
- Introduction to the plotly Package in R
- plotly Barplot in R
- plotly Line Plot in R
- plotly Histogram in R
- plotly Scatterplot in R
- plotly Heatmap in R
In summary: In this tutorial, you have learned how to draw interactive boxcharts using the plotly package in R programming. Let us know in the comments section below, in case you have any questions on the creation of graphics in R.
Statistics Globe Newsletter
2 Comments. Leave new
Excellent tool! Thank you for sharing. Please continue.
Thank you very much Randy, glad you like the tutorial! 🙂