Replace NA with Mean by Group in R (2 Examples)
In this tutorial you’ll learn how to replace NA values with the mean by group in the R programming language.
The table of content is structured as follows:
Let’s take a look.
Data Sample & Install Packages
Let’s say we have a data sample similar to the following one:
data_sample <- data.frame(group = c("A","A", "A", "B", "B","B", "C", "C","C"), x1 = c(1, NA, 3, NA, 2, 4, NA, 7, 1), x2 = c(NA, 7, 4, -1, NA, 0, 2, NA, 6), x3 = as.character(c(3, 5, NA, 2, 4, NA, 1, 0, NA))) data_sample
Also, in order to replace the missing values and substitute them by group in our data frame, we will need to install the dplyr package and the tidyr package, if not installed:
install.packages('dplyr') install.packages('tidyr')
Now, load the packages before we start:
library(dplyr) library(tidyr)
Example 1: Replace NA with Mean by Group in all Columns
Now, we can get started. In order to replace our NA values with the mean by group in our data sample, we will use the group_by() function from dplyr package so that we can divide our data into groups (i.e. A, B, or C).
Next, also from the dplyr package, we will use the functions mutate_at() and vars() so that we can specify the variables we try to modify. Last, we will use the function replace_na() from the tidyr package, and the mean() function so that we can identify and replace our missing values with the group’s mean.
data_sample %>% group_by(group) %>% mutate_at(vars(x1,x2,x3), ~replace_na(., mean(., na.rm = TRUE)))
Example 2: Replace NA with Mean by Group in all Numeric Columns
In this example, we will also use the group_by() function so that we can divide our data into groups. Next, we can use the mutate_if() function from the dplyr package in order to replace NA values only in numeric columns. Last, we will use the ifelse() function to identify the missing values and replace them with the mean by group.
data_sample %>% group_by(group) %>% mutate_if(is.numeric, function(x) ifelse(is.na(x), mean(x, na.rm = TRUE), x))
As shown, in order to handle missing values in our data frame, we can replace them with the mean by group if our data is divided by groups, in all columns or exclusively in numeric columns.
Video, Further Resources & Summary
Do you need more explanations on how to replace the missing values with averages by group in R? Then you should have a look at the following YouTube video of the Statistics Globe YouTube channel.
The YouTube video will be added soon.
Furthermore, you could have a look at some of the related tutorials on Statistics Globe:
- Remove NA Values in Only One Column of Data Frame in R
- Replace NA Values by Row Mean in R
- R Error missing values are not allowed
- Replace NA by FALSE in R
This post has shown how to replace NA values with the average by group in R. In case you have further questions, you may leave a comment below.
This page was created in collaboration with Paula Villasante Soriano. Please have a look at Paula’s author page to get more information about her academic background and the other articles she has written for Statistics Globe.