Remove Rows with NA in R Data Frame (6 Examples) | Some or All Missing
In this article you’ll learn how to remove rows containing missing values in the R programming language.
The article consists of six examples for the removal of NA values. To be more precise, the content of the tutorial is structured like this:
Let’s just jump right in.
Example Data
The following data will be used as basement for this R programming tutorial:
data <- data.frame(x1 = c(4, 1, NA, 7, 8, 1), # Create example data x2 = c("A", NA, NA, "XX", "YO", "YA"), x3 = c(1, 0, NA, 1, 1, NA)) data # Print example data # x1 x2 x3 # 1 4 A 1 # 2 1 <NA> 0 # 3 NA <NA> NA # 4 7 XX 1 # 5 8 YO 1 # 6 1 YA NA |
data <- data.frame(x1 = c(4, 1, NA, 7, 8, 1), # Create example data x2 = c("A", NA, NA, "XX", "YO", "YA"), x3 = c(1, 0, NA, 1, 1, NA)) data # Print example data # x1 x2 x3 # 1 4 A 1 # 2 1 <NA> 0 # 3 NA <NA> NA # 4 7 XX 1 # 5 8 YO 1 # 6 1 YA NA
As you can see based on the previous output of the RStudio console, our example data frame consists of six rows and three columns. Each of the variables contains at least one NA values (i.e. missing data). The third row is missing in each of the three variables.
Example 1: Removing Rows with Some NAs Using na.omit() Function
Example 1 illustrates how to use the na.omit function to create a data set without missing values. For this, we simply have to insert the name of our data frame (i.e. data) inside of the na.omit function:
data1 <- na.omit(data) # Apply na.omit function data1 # Printing updated data # x1 x2 x3 # 1 4 A 1 # 4 7 XX 1 # 5 8 YO 1 |
data1 <- na.omit(data) # Apply na.omit function data1 # Printing updated data # x1 x2 x3 # 1 4 A 1 # 4 7 XX 1 # 5 8 YO 1
Have a look at the output of the RStudio console: Our updated data frame consists of three columns. None of these columns contains NA values.
Example 2: Removing Rows with Some NAs Using complete.cases() Function
The R programming language provides many different alternatives for the deletion of missing data in data frames. In Example 2, I’ll illustrate how to use the complete.cases function for this task:
data2 <- data[complete.cases(data), ] # Apply complete.cases function data2 # Printing updated data # x1 x2 x3 # 1 4 A 1 # 4 7 XX 1 # 5 8 YO 1 |
data2 <- data[complete.cases(data), ] # Apply complete.cases function data2 # Printing updated data # x1 x2 x3 # 1 4 A 1 # 4 7 XX 1 # 5 8 YO 1
The output is exactly the same as in Example 1. However, this time we have used the complete.cases function instead of the na.omit function.
Note that the complete.cases function has its name because it creates a complete data set without any missing values (sounds logical, doesn’t it?). The application of the complete.cases command is therefore sometimes called “listwise deletion“.
Example 3: Removing Rows with Some NAs Using rowSums() & is.na() Functions
In this Section, I’ll illustrate how to use a combination of the rowSums and is.na functions to create a complete data frame.
data3 <- data[rowSums(is.na(data)) == 0, ] # Apply rowSums & is.na data3 # Printing updated data # x1 x2 x3 # 1 4 A 1 # 4 7 XX 1 # 5 8 YO 1 |
data3 <- data[rowSums(is.na(data)) == 0, ] # Apply rowSums & is.na data3 # Printing updated data # x1 x2 x3 # 1 4 A 1 # 4 7 XX 1 # 5 8 YO 1
The output is the same as in the previous examples. However, this R code can easily be modified to retain rows with a certain amount of NAs. For instance, if you want to remove all rows with 2 or more missing values, you can replace “== 0” by “>= 2”.
Example 4: Removing Rows with Some NAs Using drop_na() Function of tidyr Package
If you prefer the tidyverse instead of the functions provided by the basic installation of the R programming language, this example may be interesting for you.
In this Example, I’ll illustrate how to apply the drop_na function of the tidyr package to delete rows containing NAs.
We first need to install and load the tidyr package:
install.packages("tidyr") # Install & load tidyr package library("tidyr") |
install.packages("tidyr") # Install & load tidyr package library("tidyr")
Now, we can use the drop_na function to drop missing rows as shown below:
data4 <- data %>% drop_na() # Apply drop_na function data4 # Printing updated data # x1 x2 x3 # 1 4 A 1 # 4 7 XX 1 # 5 8 YO 1 |
data4 <- data %>% drop_na() # Apply drop_na function data4 # Printing updated data # x1 x2 x3 # 1 4 A 1 # 4 7 XX 1 # 5 8 YO 1
Again, the output is the same as in the previous examples.
Example 5: Removing Rows with Only NAs Using is.na(), rowSums() & ncol() Functions
So far, we have removed data lines that contain at least one missing value. In Example 5, I’ll show how to remove only rows were all data cells are NA. For this, I’m using the is.na function again (as in Example 3):
data5 <- data[rowSums(is.na(data)) != ncol(data), ] # Apply is.na function data5 # Printing updated data # x1 x2 x3 # 1 4 A 1 # 2 1 <NA> 0 # 4 7 XX 1 # 5 8 YO 1 # 6 1 YA NA |
data5 <- data[rowSums(is.na(data)) != ncol(data), ] # Apply is.na function data5 # Printing updated data # x1 x2 x3 # 1 4 A 1 # 2 1 <NA> 0 # 4 7 XX 1 # 5 8 YO 1 # 6 1 YA NA
In this example, only the third row was deleted. Rows 2 and 6 were kept, since they do also contain non-NA values.
Example 6: Removing Rows with Only NAs Using filter() Function of dplyr Package
If we want to drop only rows were all values are missing, we can also use the dplyr package of the tidyverse.
If we want to use the functions of the dplyr package, we first need to install and load dplyr:
install.packages("dplyr") # Install dplyr package library("dplyr") # Load dplyr package |
install.packages("dplyr") # Install dplyr package library("dplyr") # Load dplyr package
Now, we can use a combination of the filter function of the dplyr package and the is.na function of Base R:
data6 <- filter(data, rowSums(is.na(data)) != ncol(data)) # Apply filter function data6 # Printing updated data # x1 x2 x3 # 1 4 A 1 # 2 1 <NA> 0 # 3 7 XX 1 # 4 8 YO 1 # 5 1 YA NA |
data6 <- filter(data, rowSums(is.na(data)) != ncol(data)) # Apply filter function data6 # Printing updated data # x1 x2 x3 # 1 4 A 1 # 2 1 <NA> 0 # 3 7 XX 1 # 4 8 YO 1 # 5 1 YA NA
Note that the previous R code renamed the row names of our data ranging from 1 to the number of rows of the updated data set.
Video & Further Resources
Would you like to learn more about removing missing data? Then you could watch the following video of my YouTube channel. I’m explaining the examples of the present tutorial in the video:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
Besides that, you may read the related tutorials of my website:
- Remove Empty Rows of Data Frame
- Remove Duplicated Rows from Data Frame in R
- Conditionally Remove Row from Data Frame
- All R Programming Tutorials
This article explained how to extract NA rows of a data frame in R. Please let me know in the comments, if you have additional questions.
Statistics Globe Newsletter
6 Comments. Leave new
Hi!
Regarding example 3: I used this command:
data3 = 2, ]
And as far as I can see, data3 then contains only rows, which have two or more missings instead of those being removed.
If you do this:
data3 = 2, ]
data3 should consist only of rows which have two or more variables with regular data, which means 5 rows in this case and one missing per row max because it is 3 rows… Am I right?
Hey Marvin,
I’m not sure if I get your problem correctly. So you want to keep each row with at least 2 valid values?
Regards
Joachim
somehow, it is not showing the code line correctly:
1. data3 = 2, ]
2. data3 = 2, ]
Please see my response to your previous comment.
Excellent ! As a beginner, I found it sooooo helpful!
Thanks Joachim.
Keep going with such help materials.
Hi Ali,
Thank you very much for the very kind feedback. It’s great to hear that you like my tutorials!
Regards,
Joachim