Draw plotly Histogram in R (Example)
This article provides several examples of histograms in plotly using the R programming language.
Table of contents:
Note: This article was created in collaboration with Kirby White. Kirby is an organizational effectiveness consultant and researcher, who is currently pursuing a Ph.D. at the Seattle Pacific University. You can read more about Kirby here!
Overview
Histograms are one of the most fundamental statistical charts. They are designed to show the distribution of numerical values. This is very important for understanding the range of data, how meaningful the average, median, and mode values are, and if there are extreme outliers. Histograms can look very similar to barplots, but they are often used to understand very different types of information.
In a histogram, each numeric value is categorized into a single “bin”. Each bin contains values within a certain numeric range, and the number of bins in a histogram can vary based on the data, but 30 bins is a common starting point. The number of values in each bin is plotted as a vertical bar, which is why histograms can appear so similar to bar plots.
If you have not already done so, install and load the plotly package using this code: install.packages("plotly")
and library(plotly)
. Some functions and the data used in this tutorial also come from the tidyverse, which you can install with install.packages("tidyverse")
.
Example Data
We’ll use the midwest
dataset for this example, which is preloaded in the ggplot2
package (a part of the tidyverse). You can store it in an object called df
to follow along with this tutorial. midwest
contains population information for several states and counties in the mid-western portion of the United States.
df <- ggplot2::midwest
Basic and Normalized Histograms
Let’s create a simple histogram of the population for all counties in the Midwest:
plot_ly( data = df, x = ~poptotal, type = "histogram" )
This histogram makes it obvious to see that almost all the counties in the midwest have less than 1 million people, but there are a few with far more! These highly-populated counties might be considered outliers, and are skewing our data.
Histograms make it very easy to quickly to observe the effect of a transformation on the distribution of a variable. Transformations are beyond the scope of this article, but can be effective for improving statistical results.
This example shows what the same data looks like after it has been transformed with a log()
function.
plot_ly( data = df, x = ~log(poptotal), type = "histogram" )
Now it looks like a totally normal distribution! We’ll expect to use the log()
transformed values for the rest of the tutorial.
Multiple Histograms
To see the population distribution of each state in the midwest, we can create several histograms and plot them side by side.
This example takes the df
data and groups it by each state. Then, the do
command creates a separate plotly object (called p
) for each group, and then the subplot
command stitches them together in the same graphic. Using shareX = TRUE, shareY = TRUE
ensures that the horizontal and vertical axes are scaled the same across each plot so that they use the same visual proportions.
df%>% group_by(state) %>% do(p=plot_ly(., x = ~log(poptotal),name =~state, type = "histogram")) %>% subplot(nrows = 1, shareX = TRUE, shareY = TRUE)
Changing Histogram Properties
Most of the time, the default settings of a histogram will work just fine. However, sometimes you may want to change the number or size of bins.
This example shows how to specify the size of each bin, and lets the algorithm determine how many bins to create. We’ll set the size so that each bar represents a range of 3:
plot_ly( data = df, x = ~log(poptotal), type = "histogram", xbins = list(size = 3) )
This example demonstrates how to specify the number of bins, letting the algorithm decide the bin’s size. We’ll tell it to create 10 bins:
plot_ly( data = df, x = ~log(poptotal), type = "histogram", nbinsx = 10 )
Combining with Other Distribution Plots
Since histograms are used to visualize the distribution of a value, they are often combined with other distribution plots, such as density plots.
We can use the density()
function to create an object storing the density distribution and then overlay it on our histogram.
dens <- density(log(df$poptotal)) plot_ly( data = df, x = ~log(poptotal), type = "histogram", name = "Histogram") %>% add_lines(x = dens$x, y = dens$y, yaxis = "y2", name = "Density") %>% layout(yaxis2 = list(overlaying = "y", #Adds the dual y-axis side = "right", #Adds the density axis on the right side rangemode = "tozero")) #Forces both y-axes to start at 0
Alternative to Histograms
Another common technique for visualizing the distribution of a value is to use a violin plot. It might look strange at first, but they are essentially just sideways histograms. This can be an easier way to display multiple distributions alongside each other, as shown here:
plot_ly( data = df, x = ~state, y = ~log(poptotal), type = "violin", color = ~state, side = "positive", meanline = list(visible = T) )
This type of plot has also included dots to display the outlier values in each state, and a dashed mean line to easily compare the average county population in each state.
Video, Further Resources & Summary
Check out this video for a tutorial of building these histograms in plotly:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
You can check out these other articles for more detailed examples of these popular charts in plotly:
- Introduction to the plotly Package in R
- plotly Line Plot in R
- plotly Scatterplot in R
- plotly Barplot in R
- plotly Boxplot in R
- plotly Heatmap in R
This tutorial has explained how to plot a histogram using the plotly package in R. In case you have any further questions, leave a comment below.
Statistics Globe Newsletter
6 Comments. Leave new
Excelente post
Thank you very much Oswaldo, glad you think so! 🙂
Hello i have this error in the Viewver : /session/viewhtml30b8728649d8/index.html?viewer_pane=1&capabilities=1&host=http%3A%2F%2F127.0.0.1%3A63812 not found
Hey Raihana,
I have never seen this error message before. However, maybe the comments in this thread on Stack Overflow are helpful to you? The thread seems to discuss your error message.
Regards,
Joachim
Joachim,
Thank you very much for what you are doing.
Your materials are really valuable.
I enjoy and value them.
Sincerely,
Dear Sir,
Thanks a lot for the wonderful response, we are happy to hear that you like our instructions.
Regards,
Matthias