Overlay Histogram with Fitted Density Curve in Base R & ggplot2 Package (2 Examples)

 

In this tutorial you’ll learn how to fit a density plot to a histogram in the R programming language.

Table of contents:

Let’s just jump right in!

 

Introduction of Example Data

In the examples of this R programming tutorial, we’ll use the following example data:

set.seed(18462)                                       # Create example data
data <- data.frame(x = round(rnorm(1000, 10, 10)))
head(data)                                            # Print example data
#     x
# 1   6
# 2   7
# 3  14
# 4   4
# 5 -10
# 6  16

As you can see based on the output of the RStudio console, our example data contains only one numeric column. Now, let’s draw these data

 

Example 1: Histogram & Density with Base R

Example 1 explains how to fit a density curve to a histogram with the basic installation of the R programming language. First, we need to use the hist function to draw a histogram:

hist(data$x, prob = TRUE)                             # Create histogram with Base R

 

Base R Histogram

Figure 1: Histogram Created with Base R.

 

Figure 1 shows the output of the previous R code: A histogram without a density line. If we want to add a kernel density to this graph, we can use a combination of the lines and density functions:

lines(density(data$x), col = "red")                   # Overlay density curve

 

Base R Histogram and Fitted Density Curve

Figure 2: Histogram & Overlaid Density Plot Created with Base R.

 

Figure 2 illustrates the final result of Example 1: A histogram with a fitted density curve created in Base R.

 

Example 2: Histogram & Density with ggplot2 Package

Example 2 shows how to create a histogram with a fitted density plot based on the ggplot2 add-on package. First, we need to install and load ggplot2 to R:

install.packages("ggplot2")                           # Install & load ggplot2
library("ggplot2")

Now, we can use a combination of the ggplot, geom_histogram, and geom_density functions to create out graphic:

ggplot(data, aes(x)) +                                # ggplot2 histogram & density
  geom_histogram(aes(y = stat(density))) +
  geom_density(col = "red")

 

ggplot2 R Histogram and Fitted Density Curve

Figure 3: Histogram & Overlaid Density Plot Created with ggplot2 Package.

 

Figure 3 visualizes our histogram and density line created with the ggplot2 package. Note that the histogram bars of Example 1 and Example 2 look slightly different, since by default the ggplot2 packages uses a different width of the bars compared to Base R.

 

Video, Further Resources & Summary

Some time ago I have published a video on my YouTube channel, which illustrates the content of this tutorial. You can find the video below:

 

 

Furthermore, you might want to have a look at some of the related articles which I have published on my homepage.

 

In this tutorial, I illustrated how to combine histograms with probability on the y-axis and density plots in the R programming language. If you have additional questions or comments, let me know in the comments section below.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


15 Comments. Leave new

  • Hello Joachim

    Thanks for your nice videos. I have the following R script which is for only one .tsv file. I want to tweak it in a way that can plot (Histogram + line) two similar but separate .tsv files with different colours overlaid on each other. Could you please guide?

    # read in data
    df = read.csv(“your_distribution.tsv”, sep=”\t”)

    # filter Ks distribution (0.001 < Ks < 5)
    lower_bound = 0.001
    upper_bound = 5
    df = df[df$Ks lower_bound,]

    # perform node-averaging (redo when applying other filters)
    dff = aggregate(df$Ks, list(df$Family, df$Node), mean)

    # reflect the data around the lower Ks bound to account for boundary effects
    ks = c(dff$x, -dff$x + lower_bound)

    # plot a histogram and KDE on top
    hist(ks, prob=TRUE, xlim=c(0, upper_bound), n=50)
    lines(density(ks), xlim=c(0, upper_bound))

    Reply
  • Hello Cansu
    Yes that is right.
    Regards
    Ardy

    Reply
    • Hello Ardy,

      Here is how to do it in two ways (via graphics and ggplot2 libraries):

      Expanding the data frame for this example:

      set.seed(18462)                                       # Create example data
      data <- data.frame(x = round(rnorm(1000, 10, 10)), y= round(rnorm(1000, 20, 10)))
      head(data)                                            # Print example data

      Using R graphics:

      hist(data$x, prob = TRUE)   
      lines(density(data$x), col = "red")   
      lines(density(data$y), col = "blue")

      Using ggplot2 library:

      ggplot(data, aes(x)) +                                # ggplot2 histogram & density
        geom_histogram(aes(y = stat(density))) +
        geom_density(data=data, aes(x=x, y=stat(density)), col = "red") +
        geom_density(data=data, aes(x=y, y=stat(density)), col = "blue")

      I hope this answers your question. Let me know if you have any further comments.

      Regards,
      Cansu

      Reply
  • Dear Cansu
    Thanks a lot for your help. Sorry I am so new to R. Could you pls let me know how/where to fit these codes in the contest of the following, if possible?

    # read in data
    df = read.csv(“your_distribution1.tsv”, sep=”\t”)
    df = read.csv(“your_distribution2.tsv”, sep=”\t”)

    # filter Ks distribution (0.001 < Ks < 5)
    lower_bound = 0.001
    upper_bound = 5
    df = df[df$Ks lower_bound,]

    # perform node-averaging (redo when applying other filters)
    dff = aggregate(df$Ks, list(df$Family, df$Node), mean)

    # reflect the data around the lower Ks bound to account for boundary effects
    ks = c(dff$x, -dff$x + lower_bound)

    # plot a histogram and KDE on top
    hist(ks, prob=TRUE, xlim=c(0, upper_bound), n=50)
    lines(density(ks), xlim=c(0, upper_bound))

    Reply
  • Sorry I am not sure about term overlay! But better to say overlap two graphs for comparison purpose.

    Reply
  • Yes, that is correct.

    Reply
  • Thanks for your excellent website. I tried to draw a histogram with two densities using the following code.

    ggplot(data, aes(x)) + # ggplot2 histogram & density
    geom_histogram(aes(y = stat(density))) +
    geom_density(data=data, aes(x=x, y=stat(density)), col = “red”) +
    geom_density(data=data, aes(x=y, y=stat(density)), col = “blue”)
    But the code doesn’t work.
    I also tried to add a legend to my graph. but I couldn’t. For example.
    set.seed(18462) # Create example data
    data <- data.frame(x = round(rnorm(1000, 10, 10)))

    p <- ggplot(data, aes(x)) + # ggplot2 histogram & density
    geom_histogram(aes(y = stat(density))) +
    geom_density(col = "red")
    pal <- c("Observed"="black","Estimated"="blue")
    pal
    p <- p + scale_colour_manual(values = pal, limits = names(pal),
    guide = guide_legend(reverse = TRUE))

    But legend will not appear.
    I appreciate your help.

    Reply
    • Hello,

      Let’s first correct the plot gives error. You need two different data to plot two different density curves. I created 2 sample datasets for that data and data2. Then modify your code as follows:

      library(ggplot2)
       
      data <- data.frame(var1 = rnorm(1000, 10, 2))
      data2 <- data.frame(x = rnorm(1000, 15, 2))
       
      ggplot(data, aes(x = var1)) +
        geom_histogram(aes(y = ..density..), bins = 20, alpha = 0.5, color = "black", fill = "gray") +
        geom_density(color = "red", size = 1) +
        geom_density(data = data2, aes(x = x), color = "blue", linewidth = 1, linetype = "dashed") +
        labs(title = "Histogram and Density Curves", x = "Value", y = "Density") +
        theme_classic()

      Please let me know if it works. Then try to add a legend. If it doesn’t work, let me know.

      Regards,
      Cansu

      Reply
      • Thanks for your effort.
        I’m sorry for again asking for help. The graph is OK, but I can’t add a legend to the graph.
        p <- ggplot(data, aes(x = var1)) +
        geom_histogram(aes(y = ..density..), bins = 20, alpha = 0.5, color = "black", fill = "gray") +
        geom_density(color = "red", size = 1) +
        geom_density(data = data2, aes(x = x), color = "blue", linewidth = 1, linetype = "dashed") +
        labs(title = "Histogram and Density Curves", x = "Value", y = "Density")
        p<- p + scale_colour_manual(name="Legend", values = c("hist" = "black", "data" = "red", "data2" = "blue"))
        p
        Looking forward 4 your answer.

        Reply
        • Hello,

          I have renamed the data for better visualization as follows.

          data <- data.frame(Data1 = rnorm(1000, 10, 2))
          data2 <- data.frame(x = rnorm(1000, 15, 2))

          You can use the following script to specify the densities in the legend.

          ggplot(data, aes(x = Data1)) +
            geom_histogram(aes(y = ..density..), bins = 20, alpha = 0.5, color = "black", fill = "gray") +
            geom_density(aes(color = "Data1"), size = 1) +
            geom_density(data = data2, aes(x = x, color = "Data2"), linewidth = 1, linetype = "dashed") +
            scale_color_manual(name = "Legend", values = c("red", "blue"), labels = c("Data1", "Data2")) +
            labs(title = "Histogram and Density Curves", x = "Value", y = "Density") +
            theme_classic()

          I see that you also want to specify the histogram. I am unsure how necessary it is since the red curve already represents the same data as the histogram. However, you can use something like this.

          ggplot(data, aes(x = Data1)) +
            geom_histogram(aes(y = ..density.., fill = "Data1"), bins = 20, alpha = 0.5, color = "black") +
            geom_density(aes(color = "Data1"), size = 1) +
            geom_density(data = data2, aes(x = x, color = "Data2"), linewidth = 1, linetype = "dashed") +
            scale_color_manual(name = "Legend", values = c("red", "blue"), labels = c("Data1", "Data2")) +
            scale_fill_manual(name = "Legend", values = c("gray"), labels = c("Data1")) +
            labs(title = "Histogram and Density Curves", x = "Value", y = "Density") +
            theme_classic()

          I hope those solutions help you.

          Regards,
          Cansu

          Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top