How to Draw a plotly Boxplot in R (Example)

 

This article provides several examples of boxplots using the plotly package in the R programming language.

 

 

Kirby White Researcher Statistician Programmer

Note: This article was created in collaboration with Kirby White. Kirby is an organizational effectiveness consultant and researcher, who is currently pursuing a Ph.D. at the Seattle Pacific University. You can read more about Kirby here!

 

Overview

Boxplots (sometimes called “box and whisker” plots) are a fundamental type of statistical chart. They are designed to display understand the distribution and symmetry of numeric data. For instance, we could use a boxplot to show the prices of recent real estate sales. The median, 25th, and 75th percentiles would be clearly indicated on the box, and some variations would show the min/max range or outliers.

Boxplots display a wealth of information, but can appear complex and intimidating when you first encounter them! If you’re new to boxplots, they are worth reading about in more detail, here.

If you have not already done so, install and load the plotly package using this code: install.packages("plotly") and library(plotly). Some functions and the data used in this tutorial also come from the tidyverse, which you can install with install.packages("tidyverse").

 

Example Data

We’ll use the chickwts dataset for this example, which is preloaded in R. This data came from nutrition research on baby chickens and contains two columns. weight is the weight of a chick, and feed is the type of food that chick was given.

We can store the data in a data frame with this code:

df <- datasets::chickwts

 

Basic Boxplot

Let’s create a simple boxplot of the weight of all chicks in the experiment:

plot_ly(
  data = df,
  y = ~weight,
  type = "box"
)


One of the great features of plotly is the hover info. Go ahead and hover your cursor over the plot to see what the lines and boxes represent in this plot.

This graph shows us the weight distribution for all the chicks, but it would be more helpful to create a separate box for each type of feed that was used. We can do that by mapping the feed variable to the x-axis:

plot_ly(
  data = df,
  y = ~weight,
  x = ~feed,
  type = "box"
)


Now we can see some clear differences between groups! It appears that the casein group has the highest median weight, but also has a lot of variation. The meatmeal group has the widest range of weight and the linseed group seems most symmetrically distributed around its median value. The dots above and below the sunflower group indicate that those data points are outliers (i.e., extremely high or low).

 

Color and Boxpoints

There are several modifications to increase the aesthetic appeal or details displayed in a boxplot.

To make the box for each feed a different color, you can add a color argument that is mapped to the same variable:

plot_ly(
  data = df,
  y = ~weight,
  x = ~feed,
  type = "box",
  color = ~feed,
  showlegend = FALSE
)


One criticism of the box plot is that it over-simplifies the data summary and can display a misleading distribution. An approach to remedy this issue is to plot each data point as a marker (instead of just the outliers).

You can add this to your boxplots with a simple command:

plot_ly(
  data = df,
  y = ~weight,
  x = ~feed,
  type = "box",
  boxpoints = "all"
)


You can now see each record as a dot on the graph alongside the boxes. This lets you use the boxes for a quick summary and the markers for a more detailed view. Adding the points increases the visual clutter on the graph, so it may not always be helpful.

 

Mean Lines, Notches and Orientation

The solid line bisecting each box shows you the median value for the group. It can sometimes be useful to also plot the mean value as a dashed line, as the distance between the median and mean can quickly indicate outliers or a skewed distribution.

Here’s an example:

plot_ly(
  data = df,
  y = ~weight,
  x = ~feed,
  type = "box",
  boxmean = TRUE
)


This makes it even more clear that the linseed group is symmetrically distributed while the casein group is skewed downwards.

For even more statistical information, some people prefer creating a “notched” box to indicate the confidence interval of the median value. Confidence intervals are one way statisticians recognize and quantify the inherent uncertainty in their data, and allows them to build models with appropriate margins of error.

You can notch your boxes with a simple command:

plot_ly(
  data = df,
  y = ~weight,
  x = ~feed,
  type = "box",
  notched = TRUE
)


Finally, if you prefer them horizontally, you can change the orientation of your boxplots by switching the x and y variables:

plot_ly(
  data = df,
  x = ~weight,
  y = ~feed,
  type = "box"
)


 

Video, Further Resources & Summary

Check out this video for a tutorial of building these boxplots in plotly:

 

 

You can check out these other articles for more detailed examples of these popular charts in plotly:

In summary: In this tutorial, you have learned how to draw interactive boxcharts using the plotly package in R programming. Let us know in the comments section below, in case you have any questions on the creation of graphics in R.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


4 Comments. Leave new

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top