Replace Missing Values by Column Mean in R (3 Examples)
In this R tutorial you’ll learn how to substitute NA values by the mean of a data frame variable.
The content of the post is structured as follows:
Let’s get started…
Creation of Example Data
As a first step, we’ll have to create some example data:
data <- data.frame(x1 = c(NA, 2:10), # Create data frame x2 = c(rep(5, 8), NA, NA), x3 = c(4, NA, 1, 5, 6, 7, NA, 5, 9, 0)) data # Print data frame # x1 x2 x3 # 1 NA 5 4 # 2 2 5 NA # 3 3 5 1 # 4 4 5 5 # 5 5 5 6 # 6 6 5 7 # 7 7 5 NA # 8 8 5 5 # 9 9 NA 9 # 10 10 NA 0 |
data <- data.frame(x1 = c(NA, 2:10), # Create data frame x2 = c(rep(5, 8), NA, NA), x3 = c(4, NA, 1, 5, 6, 7, NA, 5, 9, 0)) data # Print data frame # x1 x2 x3 # 1 NA 5 4 # 2 2 5 NA # 3 3 5 1 # 4 4 5 5 # 5 5 5 6 # 6 6 5 7 # 7 7 5 NA # 8 8 5 5 # 9 9 NA 9 # 10 10 NA 0
As you can see based on the previous output of the RStudio console, our exemplifying data has ten rows and three numeric columns. Each of the variables contains at least one missing value (i.e. NA).
Example 1: Replacing Missing Data in One Specific Variable Using is.na() & mean() Functions
In this example, I’ll show how to substitute the NA values in only one particular data frame column by its average. For this, we can use the is.na and mean functions as shown below:
data1 <- data # Duplicate data frame data1$x1[is.na(data1$x1)] <- mean(data1$x1, na.rm = TRUE) # Replace NA in one column data1 # Print updated data frame # x1 x2 x3 # 1 6 5 4 # 2 2 5 NA # 3 3 5 1 # 4 4 5 5 # 5 5 5 6 # 6 6 5 7 # 7 7 5 NA # 8 8 5 5 # 9 9 NA 9 # 10 10 NA 0 |
data1 <- data # Duplicate data frame data1$x1[is.na(data1$x1)] <- mean(data1$x1, na.rm = TRUE) # Replace NA in one column data1 # Print updated data frame # x1 x2 x3 # 1 6 5 4 # 2 2 5 NA # 3 3 5 1 # 4 4 5 5 # 5 5 5 6 # 6 6 5 7 # 7 7 5 NA # 8 8 5 5 # 9 9 NA 9 # 10 10 NA 0
Have a look at the previous output of the RStudio console: As you can see, the first cell in the variable x1 was replaced by the mean of the variable x1 (i.e. 2).
Example 2: Replacing Missing Data in All Variables Using for-Loop
This example illustrates how to replace all numeric values of your data with a for-loop.
Have a look at the following R code:
data2 <- data # Duplicate data frame for(i in 1:ncol(data)) { # Replace NA in all columns data2[ , i][is.na(data2[ , i])] <- mean(data2[ , i], na.rm = TRUE) } data2 # Print updated data frame # x1 x2 x3 # 1 6 5 4.000 # 2 2 5 4.625 # 3 3 5 1.000 # 4 4 5 5.000 # 5 5 5 6.000 # 6 6 5 7.000 # 7 7 5 4.625 # 8 8 5 5.000 # 9 9 5 9.000 # 10 10 5 0.000 |
data2 <- data # Duplicate data frame for(i in 1:ncol(data)) { # Replace NA in all columns data2[ , i][is.na(data2[ , i])] <- mean(data2[ , i], na.rm = TRUE) } data2 # Print updated data frame # x1 x2 x3 # 1 6 5 4.000 # 2 2 5 4.625 # 3 3 5 1.000 # 4 4 5 5.000 # 5 5 5 6.000 # 6 6 5 7.000 # 7 7 5 4.625 # 8 8 5 5.000 # 9 9 5 9.000 # 10 10 5 0.000
All NA values of our data frame were replaced by the mean of the corresponding column.
Example 3: Replacing Missing Data in All Variables Using na.aggregate() Function of zoo Package
You might say that the R syntax of Example 2 was relatively complicated. Fortunately, the zoo package provides a very simple alternative if we want to replace all missing values by column means.
If we want to use the functions and commands of the zoo package, we first have to install and load zoo:
install.packages("zoo") # Install & load zoo package library("zoo") |
install.packages("zoo") # Install & load zoo package library("zoo")
Now, we can use the na.aggregate function to replace all missing data:
data3 <- na.aggregate(data) # Replace NA in all columns data3 # Print updated data frame # x1 x2 x3 # 1 6 5 4.000 # 2 2 5 4.625 # 3 3 5 1.000 # 4 4 5 5.000 # 5 5 5 6.000 # 6 6 5 7.000 # 7 7 5 4.625 # 8 8 5 5.000 # 9 9 5 9.000 # 10 10 5 0.000 |
data3 <- na.aggregate(data) # Replace NA in all columns data3 # Print updated data frame # x1 x2 x3 # 1 6 5 4.000 # 2 2 5 4.625 # 3 3 5 1.000 # 4 4 5 5.000 # 5 5 5 6.000 # 6 6 5 7.000 # 7 7 5 4.625 # 8 8 5 5.000 # 9 9 5 9.000 # 10 10 5 0.000
The output is exactly the same as in Example 2.
Video, Further Resources & Summary
Do you want to know more about missing data? Then you may want to have a look at the following video which I have published on my YouTube channel. In the video, I’m explaining the R codes of this article.
The YouTube video will be added soon.
Also, you could have a look at the related articles that I have published on my homepage.
- Replace NA with 0 (10 Examples for Data Frame, Vector & Column)
- Get Sum of Data Frame Column Values
- Find Missing Values (6 Examples for Data Frame, Column & Vector)
- The R Programming Language
Summary: In this R tutorial you learned how to exchange missing values by column means in one or multiple variables. Let me know in the comments below, if you have further questions.
Subscribe to my free statistics newsletter: