Draw plotly Histogram in R (Example)
Table of contents:
Note: This article was created in collaboration with Kirby White. Kirby is an organizational effectiveness consultant and researcher, who is currently pursuing a Ph.D. at the Seattle Pacific University. You can read more about Kirby here!
Histograms are one of the most fundamental statistical charts. They are designed to show the distribution of numerical values. This is very important for understanding the range of data, how meaningful the average, median, and mode values are, and if there are extreme outliers. Histograms can look very similar to barplots, but they are often used to understand very different types of information.
In a histogram, each numeric value is categorized into a single “bin”. Each bin contains values within a certain numeric range, and the number of bins in a histogram can vary based on the data, but 30 bins is a common starting point. The number of values in each bin is plotted as a vertical bar, which is why histograms can appear so similar to bar plots.
If you have not already done so, install and load the plotly package using this code:
library(plotly). Some functions and the data used in this tutorial also come from the tidyverse, which you can install with
We’ll use the
midwest dataset for this example, which is preloaded in the
ggplot2 package (a part of the tidyverse). You can store it in an object called
df to follow along with this tutorial.
midwest contains population information for several states and counties in the mid-western portion of the United States.
df <- ggplot2::midwest
Basic and Normalized Histograms
Let’s create a simple histogram of the population for all counties in the Midwest:
plot_ly( data = df, x = ~poptotal, type = "histogram" )
This histogram makes it obvious to see that almost all the counties in the midwest have less than 1 million people, but there are a few with far more! These highly-populated counties might be considered outliers, and are skewing our data.
Histograms make it very easy to quickly to observe the effect of a transformation on the distribution of a variable. Transformations are beyond the scope of this article, but can be effective for improving statistical results.
This example shows what the same data looks like after it has been transformed with a
plot_ly( data = df, x = ~log(poptotal), type = "histogram" )
Now it looks like a totally normal distribution! We’ll expect to use the
log() transformed values for the rest of the tutorial.
To see the population distribution of each state in the midwest, we can create several histograms and plot them side by side.
This example takes the
df data and groups it by each state. Then, the
do command creates a separate plotly object (called
p) for each group, and then the
subplot command stitches them together in the same graphic. Using
shareX = TRUE, shareY = TRUE ensures that the horizontal and vertical axes are scaled the same across each plot so that they use the same visual proportions.
df%>% group_by(state) %>% do(p=plot_ly(., x = ~log(poptotal),name =~state, type = "histogram")) %>% subplot(nrows = 1, shareX = TRUE, shareY = TRUE)
Changing Histogram Properties
Most of the time, the default settings of a histogram will work just fine. However, sometimes you may want to change the number or size of bins.
This example shows how to specify the size of each bin, and lets the algorithm determine how many bins to create. We’ll set the size so that each bar represents a range of 3:
plot_ly( data = df, x = ~log(poptotal), type = "histogram", xbins = list(size = 3) )
This example demonstrates how to specify the number of bins, letting the algorithm decide the bin’s size. We’ll tell it to create 10 bins:
plot_ly( data = df, x = ~log(poptotal), type = "histogram", nbinsx = 10 )
Combining with Other Distribution Plots
Since histograms are used to visualize the distribution of a value, they are often combined with other distribution plots, such as density plots.
We can use the
density() function to create an object storing the density distribution and then overlay it on our histogram.
dens <- density(log(df$poptotal)) plot_ly( data = df, x = ~log(poptotal), type = "histogram", name = "Histogram") %>% add_lines(x = dens$x, y = dens$y, yaxis = "y2", name = "Density") %>% layout(yaxis2 = list(overlaying = "y", #Adds the dual y-axis side = "right", #Adds the density axis on the right side rangemode = "tozero")) #Forces both y-axes to start at 0
Alternative to Histograms
Another common technique for visualizing the distribution of a value is to use a violin plot. It might look strange at first, but they are essentially just sideways histograms. This can be an easier way to display multiple distributions alongside each other, as shown here:
plot_ly( data = df, x = ~state, y = ~log(poptotal), type = "violin", color = ~state, side = "positive", meanline = list(visible = T) )
This type of plot has also included dots to display the outlier values in each state, and a dashed mean line to easily compare the average county population in each state.
Video, Further Resources & Summary
Check out this video for a tutorial of building these histograms in plotly:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
You can check out these other articles for more detailed examples of these popular charts in plotly:
- Introduction to the plotly Package in R
- plotly Line Plot in R
- plotly Scatterplot in R
- plotly Barplot in R
- plotly Boxplot in R
- plotly Heatmap in R
This tutorial has explained how to plot a histogram using the plotly package in R. In case you have any further questions, leave a comment below.
Statistics Globe Newsletter