Replace Missing Values by Column Mean in R (3 Examples)

 

In this R tutorial you’ll learn how to substitute NA values by the mean of a data frame variable.

The content of the post is structured as follows:

Let’s get started…

 

Creation of Example Data

As a first step, we’ll have to create some example data:

data <- data.frame(x1 = c(NA, 2:10),                       # Create data frame
                   x2 = c(rep(5, 8), NA, NA),
                   x3 = c(4, NA, 1, 5, 6, 7, NA, 5, 9, 0))
data                                                       # Print data frame
#    x1 x2 x3
# 1  NA  5  4
# 2   2  5 NA
# 3   3  5  1
# 4   4  5  5
# 5   5  5  6
# 6   6  5  7
# 7   7  5 NA
# 8   8  5  5
# 9   9 NA  9
# 10 10 NA  0

As you can see based on the previous output of the RStudio console, our exemplifying data has ten rows and three numeric columns. Each of the variables contains at least one missing value (i.e. NA).

 

Example 1: Replacing Missing Data in One Specific Variable Using is.na() & mean() Functions

In this example, I’ll show how to substitute the NA values in only one particular data frame column by its average. For this, we can use the is.na and mean functions as shown below:

data1 <- data                                              # Duplicate data frame
data1$x1[is.na(data1$x1)] <- mean(data1$x1, na.rm = TRUE)  # Replace NA in one column
data1                                                      # Print updated data frame
#    x1 x2 x3
# 1   6  5  4
# 2   2  5 NA
# 3   3  5  1
# 4   4  5  5
# 5   5  5  6
# 6   6  5  7
# 7   7  5 NA
# 8   8  5  5
# 9   9 NA  9
# 10 10 NA  0

Have a look at the previous output of the RStudio console: As you can see, the first cell in the variable x1 was replaced by the mean of the variable x1 (i.e. 2).

 

Example 2: Replacing Missing Data in All Variables Using for-Loop

This example illustrates how to replace all numeric values of your data with a for-loop.

Have a look at the following R code:

data2 <- data                                              # Duplicate data frame
for(i in 1:ncol(data)) {                                   # Replace NA in all columns
  data2[ , i][is.na(data2[ , i])] <- mean(data2[ , i], na.rm = TRUE)
}
data2                                                      # Print updated data frame
#    x1 x2    x3
# 1   6  5 4.000
# 2   2  5 4.625
# 3   3  5 1.000
# 4   4  5 5.000
# 5   5  5 6.000
# 6   6  5 7.000
# 7   7  5 4.625
# 8   8  5 5.000
# 9   9  5 9.000
# 10 10  5 0.000

All NA values of our data frame were replaced by the mean of the corresponding column.

 

Example 3: Replacing Missing Data in All Variables Using na.aggregate() Function of zoo Package

You might say that the R syntax of Example 2 was relatively complicated. Fortunately, the zoo package provides a very simple alternative if we want to replace all missing values by column means.

If we want to use the functions and commands of the zoo package, we first have to install and load zoo:

install.packages("zoo")                                    # Install & load zoo package
library("zoo")

Now, we can use the na.aggregate function to replace all missing data:

data3 <- na.aggregate(data)                                # Replace NA in all columns
data3                                                      # Print updated data frame
#    x1 x2    x3
# 1   6  5 4.000
# 2   2  5 4.625
# 3   3  5 1.000
# 4   4  5 5.000
# 5   5  5 6.000
# 6   6  5 7.000
# 7   7  5 4.625
# 8   8  5 5.000
# 9   9  5 9.000
# 10 10  5 0.000

The output is exactly the same as in Example 2.

 

Video, Further Resources & Summary

Do you want to know more about missing data? Then you may want to have a look at the following video which I have published on my YouTube channel. In the video, I’m explaining the R codes of this article.

 

The YouTube video will be added soon.

 

Also, you could have a look at the related articles that I have published on my homepage.

 

Summary: In this R tutorial you learned how to exchange missing values by column means in one or multiple variables. Let me know in the comments below, if you have further questions.

 

Subscribe to my free statistics newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top