Overlay Histogram with Fitted Density Curve in Base R & ggplot2 Package (2 Examples)

 

In this tutorial you’ll learn how to fit a density plot to a histogram in the R programming language.

Table of contents:

Let’s just jump right in!

 

Introduction of Example Data

In the examples of this R programming tutorial, we’ll use the following example data:

set.seed(18462)                                       # Create example data
data <- data.frame(x = round(rnorm(1000, 10, 10)))
head(data)                                            # Print example data
#     x
# 1   6
# 2   7
# 3  14
# 4   4
# 5 -10
# 6  16

As you can see based on the output of the RStudio console, our example data contains only one numeric column. Now, let’s draw these data

 

Example 1: Histogram & Density with Base R

Example 1 explains how to fit a density curve to a histogram with the basic installation of the R programming language. First, we need to use the hist function to draw a histogram:

hist(data$x, prob = TRUE)                             # Create histogram with Base R

 

Base R Histogram

Figure 1: Histogram Created with Base R.

 

Figure 1 shows the output of the previous R code: A histogram without a density line. If we want to add a kernel density to this graph, we can use a combination of the lines and density functions:

lines(density(data$x), col = "red")                   # Overlay density curve

 

Base R Histogram and Fitted Density Curve

Figure 2: Histogram & Overlaid Density Plot Created with Base R.

 

Figure 2 illustrates the final result of Example 1: A histogram with a fitted density curve created in Base R.

 

Example 2: Histogram & Density with ggplot2 Package

Example 2 shows how to create a histogram with a fitted density plot based on the ggplot2 add-on package. First, we need to install and load ggplot2 to R:

install.packages("ggplot2")                           # Install & load ggplot2
library("ggplot2")

Now, we can use a combination of the ggplot, geom_histogram, and geom_density functions to create out graphic:

ggplot(data, aes(x)) +                                # ggplot2 histogram & density
  geom_histogram(aes(y = stat(density))) +
  geom_density(col = "red")

 

ggplot2 R Histogram and Fitted Density Curve

Figure 3: Histogram & Overlaid Density Plot Created with ggplot2 Package.

 

Figure 3 visualizes our histogram and density line created with the ggplot2 package. Note that the histogram bars of Example 1 and Example 2 look slightly different, since by default the ggplot2 packages uses a different width of the bars compared to Base R.

 

Video, Further Resources & Summary

Some time ago I have published a video on my YouTube channel, which illustrates the content of this tutorial. You can find the video below:

 

 

Furthermore, you might want to have a look at some of the related articles which I have published on my homepage.

 

In this tutorial, I illustrated how to combine histograms with probability on the y-axis and density plots in the R programming language. If you have additional questions or comments, let me know in the comments section below.

 

15 Comments. Leave new

  • Hello Joachim

    Thanks for your nice videos. I have the following R script which is for only one .tsv file. I want to tweak it in a way that can plot (Histogram + line) two similar but separate .tsv files with different colours overlaid on each other. Could you please guide?

    # read in data
    df = read.csv(“your_distribution.tsv”, sep=”\t”)

    # filter Ks distribution (0.001 < Ks < 5)
    lower_bound = 0.001
    upper_bound = 5
    df = df[df$Ks lower_bound,]

    # perform node-averaging (redo when applying other filters)
    dff = aggregate(df$Ks, list(df$Family, df$Node), mean)

    # reflect the data around the lower Ks bound to account for boundary effects
    ks = c(dff$x, -dff$x + lower_bound)

    # plot a histogram and KDE on top
    hist(ks, prob=TRUE, xlim=c(0, upper_bound), n=50)
    lines(density(ks), xlim=c(0, upper_bound))

    Reply
  • Hello Cansu
    Yes that is right.
    Regards
    Ardy

    Reply
    • Hello Ardy,

      Here is how to do it in two ways (via graphics and ggplot2 libraries):

      Expanding the data frame for this example:

      set.seed(18462)                                       # Create example data
      data <- data.frame(x = round(rnorm(1000, 10, 10)), y= round(rnorm(1000, 20, 10)))
      head(data)                                            # Print example data

      Using R graphics:

      hist(data$x, prob = TRUE)   
      lines(density(data$x), col = "red")   
      lines(density(data$y), col = "blue")

      Using ggplot2 library:

      ggplot(data, aes(x)) +                                # ggplot2 histogram & density
        geom_histogram(aes(y = stat(density))) +
        geom_density(data=data, aes(x=x, y=stat(density)), col = "red") +
        geom_density(data=data, aes(x=y, y=stat(density)), col = "blue")

      I hope this answers your question. Let me know if you have any further comments.

      Regards,
      Cansu

      Reply
  • Dear Cansu
    Thanks a lot for your help. Sorry I am so new to R. Could you pls let me know how/where to fit these codes in the contest of the following, if possible?

    # read in data
    df = read.csv(“your_distribution1.tsv”, sep=”\t”)
    df = read.csv(“your_distribution2.tsv”, sep=”\t”)

    # filter Ks distribution (0.001 < Ks < 5)
    lower_bound = 0.001
    upper_bound = 5
    df = df[df$Ks lower_bound,]

    # perform node-averaging (redo when applying other filters)
    dff = aggregate(df$Ks, list(df$Family, df$Node), mean)

    # reflect the data around the lower Ks bound to account for boundary effects
    ks = c(dff$x, -dff$x + lower_bound)

    # plot a histogram and KDE on top
    hist(ks, prob=TRUE, xlim=c(0, upper_bound), n=50)
    lines(density(ks), xlim=c(0, upper_bound))

    Reply
  • Sorry I am not sure about term overlay! But better to say overlap two graphs for comparison purpose.

    Reply
  • Yes, that is correct.

    Reply
  • Thanks for your excellent website. I tried to draw a histogram with two densities using the following code.

    ggplot(data, aes(x)) + # ggplot2 histogram & density
    geom_histogram(aes(y = stat(density))) +
    geom_density(data=data, aes(x=x, y=stat(density)), col = “red”) +
    geom_density(data=data, aes(x=y, y=stat(density)), col = “blue”)
    But the code doesn’t work.
    I also tried to add a legend to my graph. but I couldn’t. For example.
    set.seed(18462) # Create example data
    data <- data.frame(x = round(rnorm(1000, 10, 10)))

    p <- ggplot(data, aes(x)) + # ggplot2 histogram & density
    geom_histogram(aes(y = stat(density))) +
    geom_density(col = "red")
    pal <- c("Observed"="black","Estimated"="blue")
    pal
    p <- p + scale_colour_manual(values = pal, limits = names(pal),
    guide = guide_legend(reverse = TRUE))

    But legend will not appear.
    I appreciate your help.

    Reply
    • Hello,

      Let’s first correct the plot gives error. You need two different data to plot two different density curves. I created 2 sample datasets for that data and data2. Then modify your code as follows:

      library(ggplot2)
       
      data <- data.frame(var1 = rnorm(1000, 10, 2))
      data2 <- data.frame(x = rnorm(1000, 15, 2))
       
      ggplot(data, aes(x = var1)) +
        geom_histogram(aes(y = ..density..), bins = 20, alpha = 0.5, color = "black", fill = "gray") +
        geom_density(color = "red", size = 1) +
        geom_density(data = data2, aes(x = x), color = "blue", linewidth = 1, linetype = "dashed") +
        labs(title = "Histogram and Density Curves", x = "Value", y = "Density") +
        theme_classic()

      Please let me know if it works. Then try to add a legend. If it doesn’t work, let me know.

      Regards,
      Cansu

      Reply
      • Thanks for your effort.
        I’m sorry for again asking for help. The graph is OK, but I can’t add a legend to the graph.
        p <- ggplot(data, aes(x = var1)) +
        geom_histogram(aes(y = ..density..), bins = 20, alpha = 0.5, color = "black", fill = "gray") +
        geom_density(color = "red", size = 1) +
        geom_density(data = data2, aes(x = x), color = "blue", linewidth = 1, linetype = "dashed") +
        labs(title = "Histogram and Density Curves", x = "Value", y = "Density")
        p<- p + scale_colour_manual(name="Legend", values = c("hist" = "black", "data" = "red", "data2" = "blue"))
        p
        Looking forward 4 your answer.

        Reply
        • Hello,

          I have renamed the data for better visualization as follows.

          data <- data.frame(Data1 = rnorm(1000, 10, 2))
          data2 <- data.frame(x = rnorm(1000, 15, 2))

          You can use the following script to specify the densities in the legend.

          ggplot(data, aes(x = Data1)) +
            geom_histogram(aes(y = ..density..), bins = 20, alpha = 0.5, color = "black", fill = "gray") +
            geom_density(aes(color = "Data1"), size = 1) +
            geom_density(data = data2, aes(x = x, color = "Data2"), linewidth = 1, linetype = "dashed") +
            scale_color_manual(name = "Legend", values = c("red", "blue"), labels = c("Data1", "Data2")) +
            labs(title = "Histogram and Density Curves", x = "Value", y = "Density") +
            theme_classic()

          I see that you also want to specify the histogram. I am unsure how necessary it is since the red curve already represents the same data as the histogram. However, you can use something like this.

          ggplot(data, aes(x = Data1)) +
            geom_histogram(aes(y = ..density.., fill = "Data1"), bins = 20, alpha = 0.5, color = "black") +
            geom_density(aes(color = "Data1"), size = 1) +
            geom_density(data = data2, aes(x = x, color = "Data2"), linewidth = 1, linetype = "dashed") +
            scale_color_manual(name = "Legend", values = c("red", "blue"), labels = c("Data1", "Data2")) +
            scale_fill_manual(name = "Legend", values = c("gray"), labels = c("Data1")) +
            labs(title = "Histogram and Density Curves", x = "Value", y = "Density") +
            theme_classic()

          I hope those solutions help you.

          Regards,
          Cansu

          Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

The maximum upload file size: 2 MB. You can upload: image. Drop file here

Top