Draw plotly Histogram in R (Example)

This article provides several examples of histograms in plotly using the R programming language.

Table of contents:

1) Overview

2) Example Data

3) Basic and Normalized Histograms

4) Multiple Histograms

5) Changing Histogram Properties

6) Combining with Other Distribution Plots

7) Alternative to Histograms

8) Video, Further Resources & Summary

9) Subscribe to the Statistics Globe Newsletter

10) Thank you!

Kirby White Researcher Statistician Programmer

Note: This article was created in collaboration with Kirby White. Kirby is an organizational effectiveness consultant and researcher, who is currently pursuing a Ph.D. at the Seattle Pacific University. You can read more about Kirby here!

Overview

Histograms are one of the most fundamental statistical charts. They are designed to show the distribution of numerical values. This is very important for understanding the range of data, how meaningful the average, median, and mode values are, and if there are extreme outliers. Histograms can look very similar to barplots, but they are often used to understand very different types of information.

In a histogram, each numeric value is categorized into a single “bin”. Each bin contains values within a certain numeric range, and the number of bins in a histogram can vary based on the data, but 30 bins is a common starting point. The number of values in each bin is plotted as a vertical bar, which is why histograms can appear so similar to bar plots.

If you have not already done so, install and load the plotly package using this code: install.packages("plotly") and library(plotly). Some functions and the data used in this tutorial also come from the tidyverse, which you can install with install.packages("tidyverse").

Example Data

We’ll use the midwest dataset for this example, which is preloaded in the ggplot2 package (a part of the tidyverse). You can store it in an object called df to follow along with this tutorial. midwest contains population information for several states and counties in the mid-western portion of the United States.

df <- ggplot2::midwest

Basic and Normalized Histograms

Let’s create a simple histogram of the population for all counties in the Midwest:

plot_ly(
  data = df,
  x = ~poptotal,
  type = "histogram"
)

This histogram makes it obvious to see that almost all the counties in the midwest have less than 1 million people, but there are a few with far more! These highly-populated counties might be considered outliers, and are skewing our data.

Histograms make it very easy to quickly to observe the effect of a transformation on the distribution of a variable. Transformations are beyond the scope of this article, but can be effective for improving statistical results.

This example shows what the same data looks like after it has been transformed with a log() function.

plot_ly(
  data = df,
  x = ~log(poptotal),
  type = "histogram"
)

Now it looks like a totally normal distribution! We’ll expect to use the log() transformed values for the rest of the tutorial.

Multiple Histograms

To see the population distribution of each state in the midwest, we can create several histograms and plot them side by side.

This example takes the df data and groups it by each state. Then, the do command creates a separate plotly object (called p) for each group, and then the subplot command stitches them together in the same graphic. Using shareX = TRUE, shareY = TRUE ensures that the horizontal and vertical axes are scaled the same across each plot so that they use the same visual proportions.

df%>%
  group_by(state) %>%
  do(p=plot_ly(., x = ~log(poptotal),name =~state, type = "histogram")) %>%
  subplot(nrows = 1, shareX = TRUE, shareY = TRUE)

Changing Histogram Properties

Most of the time, the default settings of a histogram will work just fine. However, sometimes you may want to change the number or size of bins.

This example shows how to specify the size of each bin, and lets the algorithm determine how many bins to create. We’ll set the size so that each bar represents a range of 3:

plot_ly(
  data = df,
  x = ~log(poptotal),
  type = "histogram",
  xbins = list(size = 3)
)

This example demonstrates how to specify the number of bins, letting the algorithm decide the bin’s size. We’ll tell it to create 10 bins:

plot_ly(
  data = df,
  x = ~log(poptotal),
  type = "histogram",
  nbinsx = 10
)

Combining with Other Distribution Plots

Since histograms are used to visualize the distribution of a value, they are often combined with other distribution plots, such as density plots.

We can use the density() function to create an object storing the density distribution and then overlay it on our histogram.

dens <- density(log(df$poptotal))
 
plot_ly(
  data = df,
  x = ~log(poptotal),
  type = "histogram",
  name = "Histogram") %>% 
  add_lines(x = dens$x, y = dens$y, yaxis = "y2", name = "Density") %>% 
  layout(yaxis2 = list(overlaying = "y", #Adds the dual y-axis
                       side = "right", #Adds the density axis on the right side
                       rangemode = "tozero")) #Forces both y-axes to start at 0

Alternative to Histograms

Another common technique for visualizing the distribution of a value is to use a violin plot. It might look strange at first, but they are essentially just sideways histograms. This can be an easier way to display multiple distributions alongside each other, as shown here:

plot_ly(
  data = df,
  x = ~state,
  y = ~log(poptotal),
  type = "violin",
  color = ~state,
  side = "positive",
  meanline = list(visible = T)
)

This type of plot has also included dots to display the outlier values in each state, and a dashed mean line to easily compare the average county population in each state.

Video, Further Resources & Summary

Check out this video for a tutorial of building these histograms in plotly:

You can check out these other articles for more detailed examples of these popular charts in plotly:

This tutorial has explained how to plot a histogram using the plotly package in R. In case you have any further questions, leave a comment below.

6 Comments. Leave new

Oswaldo Bello
August 3, 2022 11:19 am

Excelente post

Reply
- Joachim
  August 3, 2022 11:21 am
  
  Thank you very much Oswaldo, glad you think so! 🙂
  
  Reply
raihana aikous
August 21, 2022 11:24 am

Hello i have this error in the Viewver : /session/viewhtml30b8728649d8/index.html?viewer_pane=1&capabilities=1&host=http%3A%2F%2F127.0.0.1%3A63812 not found

Reply
- Joachim
  August 22, 2022 9:05 am
  
  Hey Raihana,
  
  I have never seen this error message before. However, maybe the comments in this thread on Stack Overflow are helpful to you? The thread seems to discuss your error message.
  
  Regards,
  Joachim
  
  Reply
Arch Tunrner, Ph.D.
December 15, 2022 12:43 pm

Joachim,
Thank you very much for what you are doing.
Your materials are really valuable.
I enjoy and value them.
Sincerely,

Reply
- Matthias (Statistics Globe)
  December 19, 2022 9:28 am
  
  Dear Sir,
  
  Thanks a lot for the wonderful response, we are happy to hear that you like our instructions.
  
  Regards,
  Matthias
  
  Reply