Replace NA with Mean by Group in R (2 Examples)

 

In this tutorial you’ll learn how to replace NA values with the mean by group in the R programming language.

The table of content is structured as follows:

Let’s take a look.

 

Data Sample & Install Packages

Let’s say we have a data sample similar to the following one:

data_sample <- data.frame(group = c("A","A", "A", "B", "B","B", "C", "C","C"),
                          x1 = c(1, NA, 3, NA, 2, 4, NA, 7, 1),
                          x2 = c(NA, 7, 4, -1, NA, 0, 2, NA, 6),
                          x3 = as.character(c(3, 5, NA, 2, 4, NA, 1, 0, NA)))
data_sample

Replace NA with Mean by Group

Also, in order to replace the missing values and substitute them by group in our data frame, we will need to install the dplyr package and the tidyr package, if not installed:

install.packages('dplyr')
install.packages('tidyr')

Now, load the packages before we start:

library(dplyr)
library(tidyr)

 

Example 1: Replace NA with Mean by Group in all Columns

Now, we can get started. In order to replace our NA values with the mean by group in our data sample, we will use the group_by() function from dplyr package so that we can divide our data into groups (i.e. A, B, or C).

Next, also from the dplyr package, we will use the functions mutate_at() and vars() so that we can specify the variables we try to modify. Last, we will use the function replace_na() from the tidyr package, and the mean() function so that we can identify and replace our missing values with the group’s mean.

data_sample %>% 
  group_by(group) %>% 
  mutate_at(vars(x1,x2,x3), 
            ~replace_na(., 
                        mean(., na.rm = TRUE)))

Replace NA with Mean by Group

 

Example 2: Replace NA with Mean by Group in all Numeric Columns

In this example, we will also use the group_by() function so that we can divide our data into groups. Next, we can use the mutate_if() function from the dplyr package in order to replace NA values only in numeric columns. Last, we will use the ifelse() function to identify the missing values and replace them with the mean by group.

data_sample %>% 
  group_by(group) %>% 
  mutate_if(is.numeric, 
            function(x) ifelse(is.na(x), 
                               mean(x, na.rm = TRUE), 
                               x))

Replace NA with Mean by Group

As shown, in order to handle missing values in our data frame, we can replace them with the mean by group if our data is divided by groups, in all columns or exclusively in numeric columns.

 

Video, Further Resources & Summary

Do you need more explanations on how to replace the missing values with averages by group in R? Then you should have a look at the following YouTube video of the Statistics Globe YouTube channel.

 

The YouTube video will be added soon.

 

Furthermore, you could have a look at some of the related tutorials on Statistics Globe:

This post has shown how to replace NA values with the average by group in R. In case you have further questions, you may leave a comment below.

 

Paula Villasante Soriano Statistician & R Programmer

This page was created in collaboration with Paula Villasante Soriano. Please have a look at Paula’s author page to get more information about her academic background and the other articles she has written for Statistics Globe.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top