Overlay Histogram with Fitted Density Curve in Base R & ggplot2 Package (2 Examples)

In this tutorial you’ll learn how to fit a density plot to a histogram in the R programming language.

Table of contents:

Let’s just jump right in!

Introduction of Example Data

In the examples of this R programming tutorial, we’ll use the following example data:

set.seed(18462)                                       # Create example data
data <- data.frame(x = round(rnorm(1000, 10, 10)))
head(data)                                            # Print example data
#     x
# 1   6
# 2   7
# 3  14
# 4   4
# 5 -10
# 6  16

As you can see based on the output of the RStudio console, our example data contains only one numeric column. Now, let’s draw these data…

Example 1: Histogram & Density with Base R

Example 1 explains how to fit a density curve to a histogram with the basic installation of the R programming language. First, we need to use the hist function to draw a histogram:

hist(data$x, prob = TRUE)                             # Create histogram with Base R

Base R Histogram

Figure 1: Histogram Created with Base R.

Figure 1 shows the output of the previous R code: A histogram without a density line. If we want to add a kernel density to this graph, we can use a combination of the lines and density functions:

lines(density(data$x), col = "red")                   # Overlay density curve

Base R Histogram and Fitted Density Curve

Figure 2: Histogram & Overlaid Density Plot Created with Base R.

Figure 2 illustrates the final result of Example 1: A histogram with a fitted density curve created in Base R.

Example 2: Histogram & Density with ggplot2 Package

Example 2 shows how to create a histogram with a fitted density plot based on the ggplot2 add-on package. First, we need to install and load ggplot2 to R:

install.packages("ggplot2")                           # Install & load ggplot2
library("ggplot2")

Now, we can use a combination of the ggplot, geom_histogram, and geom_density functions to create out graphic:

ggplot(data, aes(x)) +                                # ggplot2 histogram & density
  geom_histogram(aes(y = stat(density))) +
  geom_density(col = "red")

ggplot2 R Histogram and Fitted Density Curve

Figure 3: Histogram & Overlaid Density Plot Created with ggplot2 Package.

Figure 3 visualizes our histogram and density line created with the ggplot2 package. Note that the histogram bars of Example 1 and Example 2 look slightly different, since by default the ggplot2 packages uses a different width of the bars compared to Base R.

Video, Further Resources & Summary

Some time ago I have published a video on my YouTube channel, which illustrates the content of this tutorial. You can find the video below:

Furthermore, you might want to have a look at some of the related articles which I have published on my homepage.

In this tutorial, I illustrated how to combine histograms with probability on the y-axis and density plots in the R programming language. If you have additional questions or comments, let me know in the comments section below.

15 Comments. Leave new

Ardy
December 19, 2022 6:58 am

Hello Joachim

Thanks for your nice videos. I have the following R script which is for only one .tsv file. I want to tweak it in a way that can plot (Histogram + line) two similar but separate .tsv files with different colours overlaid on each other. Could you please guide?

# read in data
df = read.csv(“your_distribution.tsv”, sep=”\t”)

# filter Ks distribution (0.001 < Ks < 5)
lower_bound = 0.001
upper_bound = 5
df = df[df$Ks lower_bound,]

# perform node-averaging (redo when applying other filters)
dff = aggregate(df$Ks, list(df$Family, df$Node), mean)

# reflect the data around the lower Ks bound to account for boundary effects
ks = c(dff$x, -dff$x + lower_bound)

# plot a histogram and KDE on top
hist(ks, prob=TRUE, xlim=c(0, upper_bound), n=50)
lines(density(ks), xlim=c(0, upper_bound))

Reply
- Cansu (Statistics Globe)
  December 19, 2022 4:09 pm
  
  Hello Ardy,
  
  Thank you for following us! Would you like to draw two density lines overlaid on each other on a histogram?
  
  Regards,
  Cansu
  
  Reply

Ardy

December 19, 2022 9:05 pm

Hello Cansu
Yes that is right.
Regards
Ardy

Cansu (Statistics Globe)

December 20, 2022 10:25 am

Hello Ardy,

Here is how to do it in two ways (via graphics and ggplot2 libraries):

Expanding the data frame for this example:

set.seed(18462)                                       # Create example data
data <- data.frame(x = round(rnorm(1000, 10, 10)), y= round(rnorm(1000, 20, 10)))
head(data)                                            # Print example data

Using R graphics:

hist(data$x, prob = TRUE)   
lines(density(data$x), col = "red")   
lines(density(data$y), col = "blue")

Using ggplot2 library:

ggplot(data, aes(x)) +                                # ggplot2 histogram & density
  geom_histogram(aes(y = stat(density))) +
  geom_density(data=data, aes(x=x, y=stat(density)), col = "red") +
  geom_density(data=data, aes(x=y, y=stat(density)), col = "blue")

I hope this answers your question. Let me know if you have any further comments.

Regards,
Cansu

Ardy
December 20, 2022 10:39 am

Dear Cansu
Thanks a lot for your help. Sorry I am so new to R. Could you pls let me know how/where to fit these codes in the contest of the following, if possible?

# read in data
df = read.csv(“your_distribution1.tsv”, sep=”\t”)
df = read.csv(“your_distribution2.tsv”, sep=”\t”)

# filter Ks distribution (0.001 < Ks < 5)
lower_bound = 0.001
upper_bound = 5
df = df[df$Ks lower_bound,]

# perform node-averaging (redo when applying other filters)
dff = aggregate(df$Ks, list(df$Family, df$Node), mean)

# reflect the data around the lower Ks bound to account for boundary effects
ks = c(dff$x, -dff$x + lower_bound)

# plot a histogram and KDE on top
hist(ks, prob=TRUE, xlim=c(0, upper_bound), n=50)
lines(density(ks), xlim=c(0, upper_bound))

Reply
Ardy
December 20, 2022 10:48 am

Sorry I am not sure about term overlay! But better to say overlap two graphs for comparison purpose.

Reply
- Cansu (Statistics Globe)
  December 20, 2022 10:58 am
  
  Hello again:)
  
  Could we say that you would like to plot multiple histograms on the same panel? Not fitting multiple curves?
  
  Regards,
  Cansu
  
  Reply
Ardy
December 20, 2022 11:09 am

Yes, that is correct.

Reply
- Cansu (Statistics Globe)
  December 20, 2022 3:20 pm
  
  You can check our tutorial Draw Multiple Overlaid Histograms with ggplot2 Package in R and this link.
  
  I hope those help!
  Regards,
  Cansu
  
  Reply
md3948869
April 17, 2023 1:26 am

Thanks for your excellent website. I tried to draw a histogram with two densities using the following code.

ggplot(data, aes(x)) + # ggplot2 histogram & density
geom_histogram(aes(y = stat(density))) +
geom_density(data=data, aes(x=x, y=stat(density)), col = “red”) +
geom_density(data=data, aes(x=y, y=stat(density)), col = “blue”)
But the code doesn’t work.
I also tried to add a legend to my graph. but I couldn’t. For example.
set.seed(18462) # Create example data
data <- data.frame(x = round(rnorm(1000, 10, 10)))

p <- ggplot(data, aes(x)) + # ggplot2 histogram & density
geom_histogram(aes(y = stat(density))) +
geom_density(col = "red")
pal <- c("Observed"="black","Estimated"="blue")
pal
p <- p + scale_colour_manual(values = pal, limits = names(pal),
guide = guide_legend(reverse = TRUE))

But legend will not appear.
I appreciate your help.

Reply
- Cansu (Statistics Globe)
  April 18, 2023 2:47 pm
  Hello,
  
  Let’s first correct the plot gives error. You need two different data to plot two different density curves. I created 2 sample datasets for that data and data2. Then modify your code as follows:
  library(ggplot2) data <- data.frame(var1 = rnorm(1000, 10, 2)) data2 <- data.frame(x = rnorm(1000, 15, 2)) ggplot(data, aes(x = var1)) + geom_histogram(aes(y = ..density..), bins = 20, alpha = 0.5, color = "black", fill = "gray") + geom_density(color = "red", size = 1) + geom_density(data = data2, aes(x = x), color = "blue", linewidth = 1, linetype = "dashed") + labs(title = "Histogram and Density Curves", x = "Value", y = "Density") + theme_classic()
  Please let me know if it works. Then try to add a legend. If it doesn’t work, let me know.
  
  Regards,
  Cansu
  Reply
  - md3948869
    April 19, 2023 12:52 am
    
    Thanks for your effort.
    I’m sorry for again asking for help. The graph is OK, but I can’t add a legend to the graph.
    p <- ggplot(data, aes(x = var1)) +
    geom_histogram(aes(y = ..density..), bins = 20, alpha = 0.5, color = "black", fill = "gray") +
    geom_density(color = "red", size = 1) +
    geom_density(data = data2, aes(x = x), color = "blue", linewidth = 1, linetype = "dashed") +
    labs(title = "Histogram and Density Curves", x = "Value", y = "Density")
    p<- p + scale_colour_manual(name="Legend", values = c("hist" = "black", "data" = "red", "data2" = "blue"))
    p
    Looking forward 4 your answer.
    
    Reply
    - Cansu (Statistics Globe)
      April 19, 2023 9:00 am
      Hello,
      
      I have renamed the data for better visualization as follows.
      
      data <- data.frame(Data1 = rnorm(1000, 10, 2)) data2 <- data.frame(x = rnorm(1000, 15, 2))
      
      You can use the following script to specify the densities in the legend.
      
      ggplot(data, aes(x = Data1)) + geom_histogram(aes(y = ..density..), bins = 20, alpha = 0.5, color = "black", fill = "gray") + geom_density(aes(color = "Data1"), size = 1) + geom_density(data = data2, aes(x = x, color = "Data2"), linewidth = 1, linetype = "dashed") + scale_color_manual(name = "Legend", values = c("red", "blue"), labels = c("Data1", "Data2")) + labs(title = "Histogram and Density Curves", x = "Value", y = "Density") + theme_classic()
      
      I see that you also want to specify the histogram. I am unsure how necessary it is since the red curve already represents the same data as the histogram. However, you can use something like this.
      
      ggplot(data, aes(x = Data1)) + geom_histogram(aes(y = ..density.., fill = "Data1"), bins = 20, alpha = 0.5, color = "black") + geom_density(aes(color = "Data1"), size = 1) + geom_density(data = data2, aes(x = x, color = "Data2"), linewidth = 1, linetype = "dashed") + scale_color_manual(name = "Legend", values = c("red", "blue"), labels = c("Data1", "Data2")) + scale_fill_manual(name = "Legend", values = c("gray"), labels = c("Data1")) + labs(title = "Histogram and Density Curves", x = "Value", y = "Density") + theme_classic()
      
      I hope those solutions help you.
      
      Regards,
      Cansu
      Reply
      - md3948869
        April 19, 2023 10:14 am
        
        Thanks Cansu.
        I want to be a statistician like you !!!!
      - Cansu (Statistics Globe)
        April 19, 2023 10:30 am
        
        I am sure you will be.
        
        Have a good one!
        Cansu